Hi,
i'm using nutch-2006-02-22.tar.gz(release 0.8) to nutch web.
but when i run
 "bin/nutch crawl seeds -dir cnblogs -depth 3"
command, i always got negative map progress?!
just like this:
"
060224 135430 seeds\urls.txt:0+23
060224 135431 seeds\urls.txt:0+23
060224 135432  map -32509%  reduce 0%
060224 135432 seeds\urls.txt:0+23
060224 135433  map -223717%  reduce 0%
060224 135433 seeds\urls.txt:0+23
060224 135434  map -431935%  reduce 0%
060224 135434 seeds\urls.txt:0+23
060224 135435  map -594322%  reduce 0%
060224 135435 seeds\urls.txt:0+23
060224 135436 seeds\urls.txt:0+23
060224 135436  map -894617%  reduce 0%
060224 135437 seeds\urls.txt:0+23
060224 135437  map -1081922%  reduce 0%
060224 135438 seeds\urls.txt:0+23
060224 135438  map -1297061%  reduce 0%
060224 135439 seeds\urls.txt:0+23
060224 135439  map -1512170%  reduce 0%
"
can you see it?why map progress become negative?

tks!

more log:
"

$ bin/nutch crawl seeds -dir cnblogs -depth 3
060224 135426 parsing jar:file:/C:/cygwin/home/VictorZheng/nutch/lib/hadoop-
0.1-
dev.jar!/hadoop-default.xml
060224 135427 parsing
file:/C:/cygwin/home/VictorZheng/nutch/conf/nutch-default.
xml
060224 135427 parsing file:/C:/cygwin/home/VictorZheng/nutch/conf/crawl-
tool.xml

060224 135427 parsing jar:file:/C:/cygwin/home/VictorZheng/nutch/lib/hadoop-
0.1-
dev.jar!/mapred-default.xml
060224 135427 parsing file:/C:/cygwin/home/VictorZheng/nutch/conf/nutch-
site.xml

060224 135427 parsing file:/C:/cygwin/home/VictorZheng/nutch/conf/hadoop-
site.xm
l
060224 135427 crawl started in: cnblogs
060224 135427 rootUrlDir = seeds
060224 135427 threads = 10
060224 135427 depth = 3
060224 135427 Injector: starting
060224 135427 Injector: crawlDb: cnblogs\crawldb
060224 135427 Injector: urlDir: seeds
060224 135427 Injector: Converting injected urls to crawl db entries.

...

060224 135429 Using URL normalizer: org.apache.nutch.net.BasicUrlNormalizer
060224 135429  map 0%  reduce 0%
060224 135429 Plugins: looking in: C:\cygwin\home\VictorZheng\nutch\plugins
060224 135430 Plugin Auto-activation mode: [true]
060224 135430 Registered Plugins:
060224 135430   HTTP Framework (lib-http)
060224 135430   CyberNeko HTML Parser (lib-nekohtml)
060224 135430   URL Query Filter (query-url)
060224 135430   Site Query Filter (query-site)
060224 135430   Html Parse Plug-in (parse-html)
060224 135430   Http Protocol Plug-in (protocol-http)
060224 135430   the nutch core extension points (nutch-extensionpoints)
060224 135430   Basic Indexing Filter (index-basic)
060224 135430   Text Parse Plug-in (parse-text)
060224 135430   JavaScript Parser (parse-js)
060224 135430   Regex URL Filter (urlfilter-regex)
060224 135430   Basic Query Filter (query-basic)
060224 135430 Registered Extension-Points:
060224 135430   Nutch Protocol (org.apache.nutch.protocol.Protocol)
060224 135430   Nutch URL Filter (org.apache.nutch.net.URLFilter)
060224 135430   HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
060224 135430   Nutch Online Search Results Clustering Plugin (
org.apache.nutch.
clustering.OnlineClusterer)
060224 135430   Nutch Indexing Filter (
org.apache.nutch.indexer.IndexingFilter)
060224 135430   Nutch Content Parser (org.apache.nutch.parse.Parser)
060224 135430   Ontology Model Loader (org.apache.nutch.ontology.Ontology)
060224 135430   Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
060224 135430   Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
060224 135430 found resource crawl-urlfilter.txt at
file:/C:/cygwin/home/VictorZ
heng/nutch/conf/crawl-urlfilter.txt
060224 135430 seeds\urls.txt:0+23
060224 135431 seeds\urls.txt:0+23
060224 135432  map -32509%  reduce 0%
060224 135432 seeds\urls.txt:0+23
060224 135433  map -223717%  reduce 0%
060224 135433 seeds\urls.txt:0+23
060224 135434  map -431935%  reduce 0%
060224 135434 seeds\urls.txt:0+23
"

Reply via email to