subject:"zero pages"

Re: zero pages

2006-01-23 Thread Shahinul Islam

Dear Jack

Thank you so much for your help and good wishes:) Things are now as i
expected. As it turns our i have done something very stupid at 4 in the
morning .. and thats y it wasnt working.

On 1/23/06, Jack Tang <[EMAIL PROTECTED]> wrote:
>
> Hi
>
> You can test your url before crawling by runnning
> org.apache.nutch.net.RegexURLFilter class (if you keep the default
> nutch config) in urlfilter-regexp plugin. Also, please note that '.'
> (dot) is the metadata in regexp. So you can also try
> +^http://([a-z0-9]*\.)*bbc\.com/.
>
> Good luck!
>
> /Jack
>
>
>
> On 1/23/06, Shahinul Islam <[EMAIL PROTECTED]> wrote:
> > Hello thank you so much for your email. I have checked the urlfilter so
> many
> > times before even sending the email.
> >
> > the conten of that file is
> > +^http://([a-z0-9]*\.)*bbc.com/
> >
> > while the conten of the flat url file is
> > http://www.bbc.com
> >
> > the strange thing is that its working with the config
> >  the conten of that file is
> > +^http://([a-z0-9]*\.)*apache.com/
> >
> > while the conten of the flat url file is
> > http://www.apache.com
> >
> >
> >
> >
> > On 1/23/06, Jack Tang <[EMAIL PROTECTED]> wrote:
> > >
> > > Hi
> > >
> > >
> -
> > > 060123 081549 found resource crawl-urlfilter.txt at file:/D:/nutch-
> 0.7.1
> > > /conf/crawl-urlfilter.txt
> > > .060123 081549 Added 0 pages <-- no seeds found at all
> > >
> > >
> 
> > > Pls check you url filters
> > >
> > > /Jack
> > >
> > > On 1/23/06, Shahinul Islam <[EMAIL PROTECTED]> wrote:
> > > > Hello I just started using nutch. I followd the tutorial and
> eveything
> > > > worked file with *Intranet Crawling* with site http://www.apache.org
> .
> > > >
> > > > The problem is when I try any other sites (e.g http://www.bbc.com)
> it
> > > get
> > > > Zero pages. below is the log
> > > >
> > > > run java in D:\j2sdk1.4.2_04
> > > > 060123 081546 parsing file:/D:/nutch-0.7.1/conf/nutch-default.xml
> > > > 060123 081547 parsing file:/D:/nutch-0.7.1/conf/crawl-tool.xml
> > > > 060123 081547 parsing file:/D:/nutch-0.7.1/conf/nutch-site.xml
> > > > 060123 081547 No FS indicated, using default:local
> > > > 060123 081547 crawl started in: d:/crawled
> > > > 060123 081547 rootUrlFile = urls
> > > > 060123 081547 threads = 10
> > > > 060123 081547 depth = 3
> > > > 060123 081548 Created webdb at LocalFS,D:\crawled\db
> > > > 060123 081548 Starting URL processing
> > > > 060123 081548 Plugins: looking in: D:\nutch-0.7.1\plugins
> > > > 060123 081548 parsing: D:\nutch-
> > > 0.7.1\plugins\clustering-carrot2\plugin.xml
> > > > 060123 081549 impl: point=
> org.apache.nutch.clustering.OnlineClustererclass=
> > > > org.apache.nutch.clustering.carrot2.Clusterer
> > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\creativecommons
> > > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-basic\plugin.xml
> > > > 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilterclass=
> > > > org.apache.nutch.indexer.basic.BasicIndexingFilter
> > > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-more\plugin.xml
> > > > 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilterclass=
> > > > org.apache.nutch.indexer.more.MoreIndexingFilter
> > > > 060123 081549 not including: D:\nutch-
> 0.7.1\plugins\language-identifier
> > > > 060123 081549 parsing: D:\nutch-
> > > > 0.7.1\plugins\nutch-extensionpoints\plugin.xml
> > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\ontology
> > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-ext
> > > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-html\plugin.xml
> > > > 060123 081549 impl: point=org.apache.nutch.parse.Parser class=
> > > > org.apache.nutch.parse.html.HtmlParser
> > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-js
> > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-msword
> > > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-pdf
> > > >

Re: zero pages

2006-01-22 Thread Jack Tang

Hi

You can test your url before crawling by runnning 
org.apache.nutch.net.RegexURLFilter class (if you keep the default
nutch config) in urlfilter-regexp plugin. Also, please note that '.'
(dot) is the metadata in regexp. So you can also try
+^http://([a-z0-9]*\.)*bbc\.com/.

Good luck!

/Jack



On 1/23/06, Shahinul Islam <[EMAIL PROTECTED]> wrote:
> Hello thank you so much for your email. I have checked the urlfilter so many
> times before even sending the email.
>
> the conten of that file is
> +^http://([a-z0-9]*\.)*bbc.com/
>
> while the conten of the flat url file is
> http://www.bbc.com
>
> the strange thing is that its working with the config
>  the conten of that file is
> +^http://([a-z0-9]*\.)*apache.com/
>
> while the conten of the flat url file is
> http://www.apache.com
>
>
>
>
> On 1/23/06, Jack Tang <[EMAIL PROTECTED]> wrote:
> >
> > Hi
> >
> > -
> > 060123 081549 found resource crawl-urlfilter.txt at file:/D:/nutch-0.7.1
> > /conf/crawl-urlfilter.txt
> > .060123 081549 Added 0 pages <-- no seeds found at all
> >
> > 
> > Pls check you url filters
> >
> > /Jack
> >
> > On 1/23/06, Shahinul Islam <[EMAIL PROTECTED]> wrote:
> > > Hello I just started using nutch. I followd the tutorial and eveything
> > > worked file with *Intranet Crawling* with site http://www.apache.org.
> > >
> > > The problem is when I try any other sites (e.g http://www.bbc.com) it
> > get
> > > Zero pages. below is the log
> > >
> > > run java in D:\j2sdk1.4.2_04
> > > 060123 081546 parsing file:/D:/nutch-0.7.1/conf/nutch-default.xml
> > > 060123 081547 parsing file:/D:/nutch-0.7.1/conf/crawl-tool.xml
> > > 060123 081547 parsing file:/D:/nutch-0.7.1/conf/nutch-site.xml
> > > 060123 081547 No FS indicated, using default:local
> > > 060123 081547 crawl started in: d:/crawled
> > > 060123 081547 rootUrlFile = urls
> > > 060123 081547 threads = 10
> > > 060123 081547 depth = 3
> > > 060123 081548 Created webdb at LocalFS,D:\crawled\db
> > > 060123 081548 Starting URL processing
> > > 060123 081548 Plugins: looking in: D:\nutch-0.7.1\plugins
> > > 060123 081548 parsing: D:\nutch-
> > 0.7.1\plugins\clustering-carrot2\plugin.xml
> > > 060123 081549 impl: 
> > > point=org.apache.nutch.clustering.OnlineClustererclass=
> > > org.apache.nutch.clustering.carrot2.Clusterer
> > > 060123 081549 not including: D:\nutch-0.7.1\plugins\creativecommons
> > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-basic\plugin.xml
> > > 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class=
> > > org.apache.nutch.indexer.basic.BasicIndexingFilter
> > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-more\plugin.xml
> > > 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class=
> > > org.apache.nutch.indexer.more.MoreIndexingFilter
> > > 060123 081549 not including: D:\nutch-0.7.1\plugins\language-identifier
> > > 060123 081549 parsing: D:\nutch-
> > > 0.7.1\plugins\nutch-extensionpoints\plugin.xml
> > > 060123 081549 not including: D:\nutch-0.7.1\plugins\ontology
> > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-ext
> > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-html\plugin.xml
> > > 060123 081549 impl: point=org.apache.nutch.parse.Parser class=
> > > org.apache.nutch.parse.html.HtmlParser
> > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-js
> > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-msword
> > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-pdf
> > > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-rss
> > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-text\plugin.xml
> > > 060123 081549 impl: point=org.apache.nutch.parse.Parser class=
> > > org.apache.nutch.parse.text.TextParser
> > > 060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-file
> > > 060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-ftp
> > > 060123 081549 parsing: D:\nutch-0.7.1\plugins\protocol-http\plugin.xml
> > > 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class=
> > > org.apache.nutch.protocol.http.Http
> > > 060123 081549 parsing: D:\nutch-
> > 0.

Re: zero pages

2006-01-22 Thread Shahinul Islam

Hello thank you so much for your email. I have checked the urlfilter so many
times before even sending the email.

the conten of that file is
+^http://([a-z0-9]*\.)*bbc.com/

while the conten of the flat url file is
http://www.bbc.com

the strange thing is that its working with the config
 the conten of that file is
+^http://([a-z0-9]*\.)*apache.com/

while the conten of the flat url file is
http://www.apache.com




On 1/23/06, Jack Tang <[EMAIL PROTECTED]> wrote:
>
> Hi
>
> -
> 060123 081549 found resource crawl-urlfilter.txt at file:/D:/nutch-0.7.1
> /conf/crawl-urlfilter.txt
> .060123 081549 Added 0 pages <-- no seeds found at all
>
> 
> Pls check you url filters
>
> /Jack
>
> On 1/23/06, Shahinul Islam <[EMAIL PROTECTED]> wrote:
> > Hello I just started using nutch. I followd the tutorial and eveything
> > worked file with *Intranet Crawling* with site http://www.apache.org.
> >
> > The problem is when I try any other sites (e.g http://www.bbc.com) it
> get
> > Zero pages. below is the log
> >
> > run java in D:\j2sdk1.4.2_04
> > 060123 081546 parsing file:/D:/nutch-0.7.1/conf/nutch-default.xml
> > 060123 081547 parsing file:/D:/nutch-0.7.1/conf/crawl-tool.xml
> > 060123 081547 parsing file:/D:/nutch-0.7.1/conf/nutch-site.xml
> > 060123 081547 No FS indicated, using default:local
> > 060123 081547 crawl started in: d:/crawled
> > 060123 081547 rootUrlFile = urls
> > 060123 081547 threads = 10
> > 060123 081547 depth = 3
> > 060123 081548 Created webdb at LocalFS,D:\crawled\db
> > 060123 081548 Starting URL processing
> > 060123 081548 Plugins: looking in: D:\nutch-0.7.1\plugins
> > 060123 081548 parsing: D:\nutch-
> 0.7.1\plugins\clustering-carrot2\plugin.xml
> > 060123 081549 impl: point=org.apache.nutch.clustering.OnlineClustererclass=
> > org.apache.nutch.clustering.carrot2.Clusterer
> > 060123 081549 not including: D:\nutch-0.7.1\plugins\creativecommons
> > 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-basic\plugin.xml
> > 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class=
> > org.apache.nutch.indexer.basic.BasicIndexingFilter
> > 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-more\plugin.xml
> > 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class=
> > org.apache.nutch.indexer.more.MoreIndexingFilter
> > 060123 081549 not including: D:\nutch-0.7.1\plugins\language-identifier
> > 060123 081549 parsing: D:\nutch-
> > 0.7.1\plugins\nutch-extensionpoints\plugin.xml
> > 060123 081549 not including: D:\nutch-0.7.1\plugins\ontology
> > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-ext
> > 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-html\plugin.xml
> > 060123 081549 impl: point=org.apache.nutch.parse.Parser class=
> > org.apache.nutch.parse.html.HtmlParser
> > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-js
> > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-msword
> > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-pdf
> > 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-rss
> > 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-text\plugin.xml
> > 060123 081549 impl: point=org.apache.nutch.parse.Parser class=
> > org.apache.nutch.parse.text.TextParser
> > 060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-file
> > 060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-ftp
> > 060123 081549 parsing: D:\nutch-0.7.1\plugins\protocol-http\plugin.xml
> > 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class=
> > org.apache.nutch.protocol.http.Http
> > 060123 081549 parsing: D:\nutch-
> 0.7.1\plugins\protocol-httpclient\plugin.xml
> > 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class=
> > org.apache.nutch.protocol.httpclient.Http
> > 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class=
> > org.apache.nutch.protocol.httpclient.Http
> > 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-basic\plugin.xml
> > 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class=
> > org.apache.nutch.searcher.basic.BasicQueryFilter
> > 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-more\plugin.xml
> > 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class=
> > org.apache.nutch.searcher.more.TypeQueryFilter
> > 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class=
> > org.apa

Re: zero pages

2006-01-22 Thread Jack Tang

Hi
-
060123 081549 found resource crawl-urlfilter.txt at file:/D:/nutch-0.7.1
/conf/crawl-urlfilter.txt
.060123 081549 Added 0 pages <-- no seeds found at all

Pls check you url filters

/Jack

On 1/23/06, Shahinul Islam <[EMAIL PROTECTED]> wrote:
> Hello I just started using nutch. I followd the tutorial and eveything
> worked file with *Intranet Crawling* with site http://www.apache.org.
>
> The problem is when I try any other sites (e.g http://www.bbc.com) it get
> Zero pages. below is the log
>
> run java in D:\j2sdk1.4.2_04
> 060123 081546 parsing file:/D:/nutch-0.7.1/conf/nutch-default.xml
> 060123 081547 parsing file:/D:/nutch-0.7.1/conf/crawl-tool.xml
> 060123 081547 parsing file:/D:/nutch-0.7.1/conf/nutch-site.xml
> 060123 081547 No FS indicated, using default:local
> 060123 081547 crawl started in: d:/crawled
> 060123 081547 rootUrlFile = urls
> 060123 081547 threads = 10
> 060123 081547 depth = 3
> 060123 081548 Created webdb at LocalFS,D:\crawled\db
> 060123 081548 Starting URL processing
> 060123 081548 Plugins: looking in: D:\nutch-0.7.1\plugins
> 060123 081548 parsing: D:\nutch-0.7.1\plugins\clustering-carrot2\plugin.xml
> 060123 081549 impl: point=org.apache.nutch.clustering.OnlineClusterer class=
> org.apache.nutch.clustering.carrot2.Clusterer
> 060123 081549 not including: D:\nutch-0.7.1\plugins\creativecommons
> 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-basic\plugin.xml
> 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class=
> org.apache.nutch.indexer.basic.BasicIndexingFilter
> 060123 081549 parsing: D:\nutch-0.7.1\plugins\index-more\plugin.xml
> 060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class=
> org.apache.nutch.indexer.more.MoreIndexingFilter
> 060123 081549 not including: D:\nutch-0.7.1\plugins\language-identifier
> 060123 081549 parsing: D:\nutch-
> 0.7.1\plugins\nutch-extensionpoints\plugin.xml
> 060123 081549 not including: D:\nutch-0.7.1\plugins\ontology
> 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-ext
> 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-html\plugin.xml
> 060123 081549 impl: point=org.apache.nutch.parse.Parser class=
> org.apache.nutch.parse.html.HtmlParser
> 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-js
> 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-msword
> 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-pdf
> 060123 081549 not including: D:\nutch-0.7.1\plugins\parse-rss
> 060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-text\plugin.xml
> 060123 081549 impl: point=org.apache.nutch.parse.Parser class=
> org.apache.nutch.parse.text.TextParser
> 060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-file
> 060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-ftp
> 060123 081549 parsing: D:\nutch-0.7.1\plugins\protocol-http\plugin.xml
> 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class=
> org.apache.nutch.protocol.http.Http
> 060123 081549 parsing: D:\nutch-0.7.1\plugins\protocol-httpclient\plugin.xml
> 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class=
> org.apache.nutch.protocol.httpclient.Http
> 060123 081549 impl: point=org.apache.nutch.protocol.Protocol class=
> org.apache.nutch.protocol.httpclient.Http
> 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-basic\plugin.xml
> 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class=
> org.apache.nutch.searcher.basic.BasicQueryFilter
> 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-more\plugin.xml
> 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class=
> org.apache.nutch.searcher.more.TypeQueryFilter
> 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class=
> org.apache.nutch.searcher.more.DateQueryFilter
> 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-site\plugin.xml
> 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class=
> org.apache.nutch.searcher.site.SiteQueryFilter
> 060123 081549 parsing: D:\nutch-0.7.1\plugins\query-url\plugin.xml
> 060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class=
> org.apache.nutch.searcher.url.URLQueryFilter
> 060123 081549 not including: D:\nutch-0.7.1\plugins\urlfilter-prefix
> 060123 081549 parsing: D:\nutch-0.7.1\plugins\urlfilter-regex\plugin.xml
> 060123 081549 impl: point=org.apache.nutch.net.URLFilter class=
> org.apache.nutch.net.RegexURLFilter
> 060123 081549 found resource crawl-urlfilter.txt at file:/D:/nutch-0.7.1
> /conf/crawl-urlfilter.txt
> .060123 081549 Added 0 pages
> 060123 081549 FetchL

zero pages

2006-01-22 Thread Shahinul Islam

Hello I just started using nutch. I followd the tutorial and eveything
worked file with *Intranet Crawling* with site http://www.apache.org.

The problem is when I try any other sites (e.g http://www.bbc.com) it get
Zero pages. below is the log

run java in D:\j2sdk1.4.2_04
060123 081546 parsing file:/D:/nutch-0.7.1/conf/nutch-default.xml
060123 081547 parsing file:/D:/nutch-0.7.1/conf/crawl-tool.xml
060123 081547 parsing file:/D:/nutch-0.7.1/conf/nutch-site.xml
060123 081547 No FS indicated, using default:local
060123 081547 crawl started in: d:/crawled
060123 081547 rootUrlFile = urls
060123 081547 threads = 10
060123 081547 depth = 3
060123 081548 Created webdb at LocalFS,D:\crawled\db
060123 081548 Starting URL processing
060123 081548 Plugins: looking in: D:\nutch-0.7.1\plugins
060123 081548 parsing: D:\nutch-0.7.1\plugins\clustering-carrot2\plugin.xml
060123 081549 impl: point=org.apache.nutch.clustering.OnlineClusterer class=
org.apache.nutch.clustering.carrot2.Clusterer
060123 081549 not including: D:\nutch-0.7.1\plugins\creativecommons
060123 081549 parsing: D:\nutch-0.7.1\plugins\index-basic\plugin.xml
060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class=
org.apache.nutch.indexer.basic.BasicIndexingFilter
060123 081549 parsing: D:\nutch-0.7.1\plugins\index-more\plugin.xml
060123 081549 impl: point=org.apache.nutch.indexer.IndexingFilter class=
org.apache.nutch.indexer.more.MoreIndexingFilter
060123 081549 not including: D:\nutch-0.7.1\plugins\language-identifier
060123 081549 parsing: D:\nutch-
0.7.1\plugins\nutch-extensionpoints\plugin.xml
060123 081549 not including: D:\nutch-0.7.1\plugins\ontology
060123 081549 not including: D:\nutch-0.7.1\plugins\parse-ext
060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-html\plugin.xml
060123 081549 impl: point=org.apache.nutch.parse.Parser class=
org.apache.nutch.parse.html.HtmlParser
060123 081549 not including: D:\nutch-0.7.1\plugins\parse-js
060123 081549 not including: D:\nutch-0.7.1\plugins\parse-msword
060123 081549 not including: D:\nutch-0.7.1\plugins\parse-pdf
060123 081549 not including: D:\nutch-0.7.1\plugins\parse-rss
060123 081549 parsing: D:\nutch-0.7.1\plugins\parse-text\plugin.xml
060123 081549 impl: point=org.apache.nutch.parse.Parser class=
org.apache.nutch.parse.text.TextParser
060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-file
060123 081549 not including: D:\nutch-0.7.1\plugins\protocol-ftp
060123 081549 parsing: D:\nutch-0.7.1\plugins\protocol-http\plugin.xml
060123 081549 impl: point=org.apache.nutch.protocol.Protocol class=
org.apache.nutch.protocol.http.Http
060123 081549 parsing: D:\nutch-0.7.1\plugins\protocol-httpclient\plugin.xml
060123 081549 impl: point=org.apache.nutch.protocol.Protocol class=
org.apache.nutch.protocol.httpclient.Http
060123 081549 impl: point=org.apache.nutch.protocol.Protocol class=
org.apache.nutch.protocol.httpclient.Http
060123 081549 parsing: D:\nutch-0.7.1\plugins\query-basic\plugin.xml
060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache.nutch.searcher.basic.BasicQueryFilter
060123 081549 parsing: D:\nutch-0.7.1\plugins\query-more\plugin.xml
060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache.nutch.searcher.more.TypeQueryFilter
060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache.nutch.searcher.more.DateQueryFilter
060123 081549 parsing: D:\nutch-0.7.1\plugins\query-site\plugin.xml
060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache.nutch.searcher.site.SiteQueryFilter
060123 081549 parsing: D:\nutch-0.7.1\plugins\query-url\plugin.xml
060123 081549 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache.nutch.searcher.url.URLQueryFilter
060123 081549 not including: D:\nutch-0.7.1\plugins\urlfilter-prefix
060123 081549 parsing: D:\nutch-0.7.1\plugins\urlfilter-regex\plugin.xml
060123 081549 impl: point=org.apache.nutch.net.URLFilter class=
org.apache.nutch.net.RegexURLFilter
060123 081549 found resource crawl-urlfilter.txt at file:/D:/nutch-0.7.1
/conf/crawl-urlfilter.txt
.060123 081549 Added 0 pages
060123 081549 FetchListTool started
060123 081550 Overall processing: Sorted 0 entries in 0.0 seconds.
060123 081550 Overall processing: Sorted NaN entries/second
060123 081550 FetchListTool completed
060123 081550 logging at FINE
060123 081550 logging at INFO
060123 081551 Updating D:\crawled\db
060123 081551 Updating for D:\crawled\segments\20060123081549
060123 081551 Finishing update
060123 081551 Update finished
060123 081551 FetchListTool started
060123 081552 Overall processing: Sorted 0 entries in 0.0 seconds.
060123 081552 Overall processing: Sorted NaN entries/second
060123 081552 FetchListTool completed
060123 081552 logging at INFO
060123 081553 Updating D:\crawled\db
060123 081553 Updating for D:\crawled\segments\20060123081551
060123 081553 Finishing update
060123 081553 Update finished
060123 081553 FetchListTool started
060123

Re: zero pages

Re: zero pages

Re: zero pages

Re: zero pages

zero pages

5 matches

Site Navigation

Mail list logo

Footer information