Re: zero pages
Dear Jack Thank you so much for your help and good wishes:) Things are now as i expected. As it turns our i have done something very stupid at 4 in the morning .. and thats y it wasnt working. On 1/23/06, Jack Tang <[EMAIL PROTECTED]> wrote: > > Hi > > You can test your url before crawling by runnning > org.apache.nutch.net.RegexURLFilter class (if you keep the default > nutch config) in urlfilter-regexp plugin. Also, please note that '.' > (dot) is the metadata in regexp. So you can also try > +^http://([a-z0-9]*\.)*bbc\.com/. > > Good luck! > > /Jack > > > > On 1/23/06, Shahinul Islam <[EMAIL PROTECTED]> wrote: > > Hello thank you so much for your email. I have checked the urlfilter so > many > > times before even sending the email. > > > > the conten of that file is > > +^http://([a-z0-9]*\.)*bbc.com/ > > > > while the conten of the flat url file is > > http://www.bbc.com > > > > the strange thing is that its working with the config > > the conten of that file is > > +^http://([a-z0-9]*\.)*apache.com/ > > > > while the conten of the flat url file is > > http://www.apache.com > > > > > > > > > > On 1/23/06, Jack Tang <[EMAIL PROTECTED]> wrote: > > > > > > Hi > > > > > > > - > > > 060123 081549 found resource crawl-urlfilter.txt at file:/D:/nutch- > 0.7.1 > > > /conf/crawl-urlfilter.txt > > > .060123 081549 Added 0 pages <-- no seeds found at all > > > > > > > > > > Pls check you url filters > > > > > > /Jack > > > > > > On 1/23/06, Shahinul Islam <[EMAIL PROTECTED]> wrote: > > > > Hello I just started using nutch. I followd the tutorial and > eveything > > > > worked file with *Intranet Crawling* with site http://www.apache.org > . > > > > > > > > The problem is when I try any other sites (e.g http://www.bbc.com) > it > > > get > > > > Zero pages. below is the log > > > > > > > > run java in D:\j2sdk1.4.2_04 > > > > 060123 081546 parsing file:/D:/nutch-0.7.1/conf/nutch-default.xml > > > > 060123 081547 parsing file:/D:/nutch-0.7.1/conf/crawl-tool.xml > > > > 060123 081547 parsing file:/D:/nutch-0.7.1/conf/nutch-site.xml > > > > 060123 081547 No FS indicated, using default:local > > > > 060123 081547 crawl started in: d:/crawled > > > > 060123 081547 rootUrlFile = urls > > > > 060123 081547 threads = 10 > > > > 060123 081547 depth = 3 > > > > 060123 081548 Created webdb at LocalFS,D:\crawled\db > > > > 060123 081548 Starting URL processing > > > > 060123 081548 Plugins: looking in: D:\nutch-0.7.1\plugins > > > > 060123 081548 parsing: D:\nutch- > > > 0.7.1\plugins\clustering-carrot2\plugin.xml > > > > 060123 081549 impl: point= > org.apache.nutch.clustering.OnlineClustererclass= > > > > org.apache.nutch.clustering.carrot2.Clusterer > > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\creativecommons > > > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-basic\plugin.xml > > > > 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilterclass= > > > > org.apache.nutch.indexer.basic.BasicIndexingFilter > > > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-more\plugin.xml > > > > 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilterclass= > > > > org.apache.nutch.indexer.more.MoreIndexingFilter > > > > 060123 081549 not including: D:\nutch- > 0.7.1\plugins\language-identifier > > > > 060123 081549 parsing: D:\nutch- > > > > 0.7.1\plugins\nutch-extensionpoints\plugin.xml > > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\ontology > > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-ext > > > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-html\plugin.xml > > > > 060123 081549 impl: point=org.apache.nutch.parse.Parser class= > > > > org.apache.nutch.parse.html.HtmlParser > > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-js > > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-msword > > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-pdf > > > >
Re: zero pages
Hi You can test your url before crawling by runnning org.apache.nutch.net.RegexURLFilter class (if you keep the default nutch config) in urlfilter-regexp plugin. Also, please note that '.' (dot) is the metadata in regexp. So you can also try +^http://([a-z0-9]*\.)*bbc\.com/. Good luck! /Jack On 1/23/06, Shahinul Islam <[EMAIL PROTECTED]> wrote: > Hello thank you so much for your email. I have checked the urlfilter so many > times before even sending the email. > > the conten of that file is > +^http://([a-z0-9]*\.)*bbc.com/ > > while the conten of the flat url file is > http://www.bbc.com > > the strange thing is that its working with the config > the conten of that file is > +^http://([a-z0-9]*\.)*apache.com/ > > while the conten of the flat url file is > http://www.apache.com > > > > > On 1/23/06, Jack Tang <[EMAIL PROTECTED]> wrote: > > > > Hi > > > > - > > 060123 081549 found resource crawl-urlfilter.txt at file:/D:/nutch-0.7.1 > > /conf/crawl-urlfilter.txt > > .060123 081549 Added 0 pages <-- no seeds found at all > > > > > > Pls check you url filters > > > > /Jack > > > > On 1/23/06, Shahinul Islam <[EMAIL PROTECTED]> wrote: > > > Hello I just started using nutch. I followd the tutorial and eveything > > > worked file with *Intranet Crawling* with site http://www.apache.org. > > > > > > The problem is when I try any other sites (e.g http://www.bbc.com) it > > get > > > Zero pages. below is the log > > > > > > run java in D:\j2sdk1.4.2_04 > > > 060123 081546 parsing file:/D:/nutch-0.7.1/conf/nutch-default.xml > > > 060123 081547 parsing file:/D:/nutch-0.7.1/conf/crawl-tool.xml > > > 060123 081547 parsing file:/D:/nutch-0.7.1/conf/nutch-site.xml > > > 060123 081547 No FS indicated, using default:local > > > 060123 081547 crawl started in: d:/crawled > > > 060123 081547 rootUrlFile = urls > > > 060123 081547 threads = 10 > > > 060123 081547 depth = 3 > > > 060123 081548 Created webdb at LocalFS,D:\crawled\db > > > 060123 081548 Starting URL processing > > > 060123 081548 Plugins: looking in: D:\nutch-0.7.1\plugins > > > 060123 081548 parsing: D:\nutch- > > 0.7.1\plugins\clustering-carrot2\plugin.xml > > > 060123 081549 impl: > > > point=org.apache.nutch.clustering.OnlineClustererclass= > > > org.apache.nutch.clustering.carrot2.Clusterer > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\creativecommons > > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-basic\plugin.xml > > > 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class= > > > org.apache.nutch.indexer.basic.BasicIndexingFilter > > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-more\plugin.xml > > > 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class= > > > org.apache.nutch.indexer.more.MoreIndexingFilter > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\language-identifier > > > 060123 081549 parsing: D:\nutch- > > > 0.7.1\plugins\nutch-extensionpoints\plugin.xml > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\ontology > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-ext > > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-html\plugin.xml > > > 060123 081549 impl: point=org.apache.nutch.parse.Parser class= > > > org.apache.nutch.parse.html.HtmlParser > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-js > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-msword > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-pdf > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-rss > > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-text\plugin.xml > > > 060123 081549 impl: point=org.apache.nutch.parse.Parser class= > > > org.apache.nutch.parse.text.TextParser > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-file > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-ftp > > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\protocol-http\plugin.xml > > > 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class= > > > org.apache.nutch.protocol.http.Http > > > 060123 081549 parsing: D:\nutch- > > 0.
Re: zero pages
Hello thank you so much for your email. I have checked the urlfilter so many times before even sending the email. the conten of that file is +^http://([a-z0-9]*\.)*bbc.com/ while the conten of the flat url file is http://www.bbc.com the strange thing is that its working with the config the conten of that file is +^http://([a-z0-9]*\.)*apache.com/ while the conten of the flat url file is http://www.apache.com On 1/23/06, Jack Tang <[EMAIL PROTECTED]> wrote: > > Hi > > - > 060123 081549 found resource crawl-urlfilter.txt at file:/D:/nutch-0.7.1 > /conf/crawl-urlfilter.txt > .060123 081549 Added 0 pages <-- no seeds found at all > > > Pls check you url filters > > /Jack > > On 1/23/06, Shahinul Islam <[EMAIL PROTECTED]> wrote: > > Hello I just started using nutch. I followd the tutorial and eveything > > worked file with *Intranet Crawling* with site http://www.apache.org. > > > > The problem is when I try any other sites (e.g http://www.bbc.com) it > get > > Zero pages. below is the log > > > > run java in D:\j2sdk1.4.2_04 > > 060123 081546 parsing file:/D:/nutch-0.7.1/conf/nutch-default.xml > > 060123 081547 parsing file:/D:/nutch-0.7.1/conf/crawl-tool.xml > > 060123 081547 parsing file:/D:/nutch-0.7.1/conf/nutch-site.xml > > 060123 081547 No FS indicated, using default:local > > 060123 081547 crawl started in: d:/crawled > > 060123 081547 rootUrlFile = urls > > 060123 081547 threads = 10 > > 060123 081547 depth = 3 > > 060123 081548 Created webdb at LocalFS,D:\crawled\db > > 060123 081548 Starting URL processing > > 060123 081548 Plugins: looking in: D:\nutch-0.7.1\plugins > > 060123 081548 parsing: D:\nutch- > 0.7.1\plugins\clustering-carrot2\plugin.xml > > 060123 081549 impl: point=org.apache.nutch.clustering.OnlineClustererclass= > > org.apache.nutch.clustering.carrot2.Clusterer > > 060123 081549 not including: D:\nutch-0.7.1\plugins\creativecommons > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-basic\plugin.xml > > 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class= > > org.apache.nutch.indexer.basic.BasicIndexingFilter > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-more\plugin.xml > > 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class= > > org.apache.nutch.indexer.more.MoreIndexingFilter > > 060123 081549 not including: D:\nutch-0.7.1\plugins\language-identifier > > 060123 081549 parsing: D:\nutch- > > 0.7.1\plugins\nutch-extensionpoints\plugin.xml > > 060123 081549 not including: D:\nutch-0.7.1\plugins\ontology > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-ext > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-html\plugin.xml > > 060123 081549 impl: point=org.apache.nutch.parse.Parser class= > > org.apache.nutch.parse.html.HtmlParser > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-js > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-msword > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-pdf > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-rss > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-text\plugin.xml > > 060123 081549 impl: point=org.apache.nutch.parse.Parser class= > > org.apache.nutch.parse.text.TextParser > > 060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-file > > 060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-ftp > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\protocol-http\plugin.xml > > 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class= > > org.apache.nutch.protocol.http.Http > > 060123 081549 parsing: D:\nutch- > 0.7.1\plugins\protocol-httpclient\plugin.xml > > 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class= > > org.apache.nutch.protocol.httpclient.Http > > 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class= > > org.apache.nutch.protocol.httpclient.Http > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-basic\plugin.xml > > 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class= > > org.apache.nutch.searcher.basic.BasicQueryFilter > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-more\plugin.xml > > 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class= > > org.apache.nutch.searcher.more.TypeQueryFilter > > 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class= > > org.apa
Re: zero pages
Hi - 060123 081549 found resource crawl-urlfilter.txt at file:/D:/nutch-0.7.1 /conf/crawl-urlfilter.txt .060123 081549 Added 0 pages <-- no seeds found at all Pls check you url filters /Jack On 1/23/06, Shahinul Islam <[EMAIL PROTECTED]> wrote: > Hello I just started using nutch. I followd the tutorial and eveything > worked file with *Intranet Crawling* with site http://www.apache.org. > > The problem is when I try any other sites (e.g http://www.bbc.com) it get > Zero pages. below is the log > > run java in D:\j2sdk1.4.2_04 > 060123 081546 parsing file:/D:/nutch-0.7.1/conf/nutch-default.xml > 060123 081547 parsing file:/D:/nutch-0.7.1/conf/crawl-tool.xml > 060123 081547 parsing file:/D:/nutch-0.7.1/conf/nutch-site.xml > 060123 081547 No FS indicated, using default:local > 060123 081547 crawl started in: d:/crawled > 060123 081547 rootUrlFile = urls > 060123 081547 threads = 10 > 060123 081547 depth = 3 > 060123 081548 Created webdb at LocalFS,D:\crawled\db > 060123 081548 Starting URL processing > 060123 081548 Plugins: looking in: D:\nutch-0.7.1\plugins > 060123 081548 parsing: D:\nutch-0.7.1\plugins\clustering-carrot2\plugin.xml > 060123 081549 impl: point=org.apache.nutch.clustering.OnlineClusterer class= > org.apache.nutch.clustering.carrot2.Clusterer > 060123 081549 not including: D:\nutch-0.7.1\plugins\creativecommons > 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-basic\plugin.xml > 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class= > org.apache.nutch.indexer.basic.BasicIndexingFilter > 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-more\plugin.xml > 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class= > org.apache.nutch.indexer.more.MoreIndexingFilter > 060123 081549 not including: D:\nutch-0.7.1\plugins\language-identifier > 060123 081549 parsing: D:\nutch- > 0.7.1\plugins\nutch-extensionpoints\plugin.xml > 060123 081549 not including: D:\nutch-0.7.1\plugins\ontology > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-ext > 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-html\plugin.xml > 060123 081549 impl: point=org.apache.nutch.parse.Parser class= > org.apache.nutch.parse.html.HtmlParser > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-js > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-msword > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-pdf > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-rss > 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-text\plugin.xml > 060123 081549 impl: point=org.apache.nutch.parse.Parser class= > org.apache.nutch.parse.text.TextParser > 060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-file > 060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-ftp > 060123 081549 parsing: D:\nutch-0.7.1\plugins\protocol-http\plugin.xml > 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class= > org.apache.nutch.protocol.http.Http > 060123 081549 parsing: D:\nutch-0.7.1\plugins\protocol-httpclient\plugin.xml > 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class= > org.apache.nutch.protocol.httpclient.Http > 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class= > org.apache.nutch.protocol.httpclient.Http > 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-basic\plugin.xml > 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class= > org.apache.nutch.searcher.basic.BasicQueryFilter > 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-more\plugin.xml > 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class= > org.apache.nutch.searcher.more.TypeQueryFilter > 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class= > org.apache.nutch.searcher.more.DateQueryFilter > 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-site\plugin.xml > 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class= > org.apache.nutch.searcher.site.SiteQueryFilter > 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-url\plugin.xml > 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class= > org.apache.nutch.searcher.url.URLQueryFilter > 060123 081549 not including: D:\nutch-0.7.1\plugins\urlfilter-prefix > 060123 081549 parsing: D:\nutch-0.7.1\plugins\urlfilter-regex\plugin.xml > 060123 081549 impl: point=org.apache.nutch.net.URLFilter class= > org.apache.nutch.net.RegexURLFilter > 060123 081549 found resource crawl-urlfilter.txt at file:/D:/nutch-0.7.1 > /conf/crawl-urlfilter.txt > .060123 081549 Added 0 pages > 060123 081549 FetchL
zero pages
Hello I just started using nutch. I followd the tutorial and eveything worked file with *Intranet Crawling* with site http://www.apache.org. The problem is when I try any other sites (e.g http://www.bbc.com) it get Zero pages. below is the log run java in D:\j2sdk1.4.2_04 060123 081546 parsing file:/D:/nutch-0.7.1/conf/nutch-default.xml 060123 081547 parsing file:/D:/nutch-0.7.1/conf/crawl-tool.xml 060123 081547 parsing file:/D:/nutch-0.7.1/conf/nutch-site.xml 060123 081547 No FS indicated, using default:local 060123 081547 crawl started in: d:/crawled 060123 081547 rootUrlFile = urls 060123 081547 threads = 10 060123 081547 depth = 3 060123 081548 Created webdb at LocalFS,D:\crawled\db 060123 081548 Starting URL processing 060123 081548 Plugins: looking in: D:\nutch-0.7.1\plugins 060123 081548 parsing: D:\nutch-0.7.1\plugins\clustering-carrot2\plugin.xml 060123 081549 impl: point=org.apache.nutch.clustering.OnlineClusterer class= org.apache.nutch.clustering.carrot2.Clusterer 060123 081549 not including: D:\nutch-0.7.1\plugins\creativecommons 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-basic\plugin.xml 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class= org.apache.nutch.indexer.basic.BasicIndexingFilter 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-more\plugin.xml 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class= org.apache.nutch.indexer.more.MoreIndexingFilter 060123 081549 not including: D:\nutch-0.7.1\plugins\language-identifier 060123 081549 parsing: D:\nutch- 0.7.1\plugins\nutch-extensionpoints\plugin.xml 060123 081549 not including: D:\nutch-0.7.1\plugins\ontology 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-ext 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-html\plugin.xml 060123 081549 impl: point=org.apache.nutch.parse.Parser class= org.apache.nutch.parse.html.HtmlParser 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-js 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-msword 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-pdf 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-rss 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-text\plugin.xml 060123 081549 impl: point=org.apache.nutch.parse.Parser class= org.apache.nutch.parse.text.TextParser 060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-file 060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-ftp 060123 081549 parsing: D:\nutch-0.7.1\plugins\protocol-http\plugin.xml 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class= org.apache.nutch.protocol.http.Http 060123 081549 parsing: D:\nutch-0.7.1\plugins\protocol-httpclient\plugin.xml 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class= org.apache.nutch.protocol.httpclient.Http 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class= org.apache.nutch.protocol.httpclient.Http 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-basic\plugin.xml 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class= org.apache.nutch.searcher.basic.BasicQueryFilter 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-more\plugin.xml 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class= org.apache.nutch.searcher.more.TypeQueryFilter 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class= org.apache.nutch.searcher.more.DateQueryFilter 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-site\plugin.xml 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class= org.apache.nutch.searcher.site.SiteQueryFilter 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-url\plugin.xml 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class= org.apache.nutch.searcher.url.URLQueryFilter 060123 081549 not including: D:\nutch-0.7.1\plugins\urlfilter-prefix 060123 081549 parsing: D:\nutch-0.7.1\plugins\urlfilter-regex\plugin.xml 060123 081549 impl: point=org.apache.nutch.net.URLFilter class= org.apache.nutch.net.RegexURLFilter 060123 081549 found resource crawl-urlfilter.txt at file:/D:/nutch-0.7.1 /conf/crawl-urlfilter.txt .060123 081549 Added 0 pages 060123 081549 FetchListTool started 060123 081550 Overall processing: Sorted 0 entries in 0.0 seconds. 060123 081550 Overall processing: Sorted NaN entries/second 060123 081550 FetchListTool completed 060123 081550 logging at FINE 060123 081550 logging at INFO 060123 081551 Updating D:\crawled\db 060123 081551 Updating for D:\crawled\segments\20060123081549 060123 081551 Finishing update 060123 081551 Update finished 060123 081551 FetchListTool started 060123 081552 Overall processing: Sorted 0 entries in 0.0 seconds. 060123 081552 Overall processing: Sorted NaN entries/second 060123 081552 FetchListTool completed 060123 081552 logging at INFO 060123 081553 Updating D:\crawled\db 060123 081553 Updating for D:\crawled\segments\20060123081551 060123 081553 Finishing update 060123 081553 Update finished 060123 081553 FetchListTool started 060123