Running Issue about Nutch 1.3

2011-11-03 Thread Skiming_Zhang
Hello dear :

   I have the following running information from
hadoop.log when I configured Nutch 1.3 in Eclipse (Win 7), but I don't
know how to resolve it ,Can you help me . I'm new to nutch , so forgive me
for some mistakes of using wrong terminology!

 

2011-11-03 16:51:53,300 WARN  crawl.Crawl - solrUrl is not set, indexing
will be skipped...

2011-11-03 16:51:53,502 INFO  crawl.Crawl - crawl started in: crawl

2011-11-03 16:51:53,502 INFO  crawl.Crawl - rootUrlDir = urls

2011-11-03 16:51:53,502 INFO  crawl.Crawl - threads = 4

2011-11-03 16:51:53,502 INFO  crawl.Crawl - depth = 5

2011-11-03 16:51:53,502 INFO  crawl.Crawl - solrUrl=null

2011-11-03 16:51:53,502 INFO  crawl.Crawl - topN = 10

2011-11-03 16:51:53,518 INFO  crawl.Injector - Injector: starting at
2011-11-03 16:51:53

2011-11-03 16:51:53,518 INFO  crawl.Injector - Injector: crawlDb:
crawl/crawldb

2011-11-03 16:51:53,518 INFO  crawl.Injector - Injector: urlDir: urls

2011-11-03 16:51:53,534 INFO  crawl.Injector - Injector: Converting injected
urls to crawl db entries.

2011-11-03 16:51:53,658 WARN  mapred.JobClient - No job jar file set.  User
classes may not be found. See JobConf(Class) or JobConf#setJar(String).

2011-11-03 16:51:54,267 INFO  plugin.PluginRepository - Plugins: looking in:
E:\IdealTimes\WorkSpace\Nutch1.3\plugin

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Plugin
Auto-activation mode: [true]

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Registered Plugins:

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - the nutch core
extension points (nutch-extensionpoints)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Basic URL Normalizer
(urlnormalizer-basic)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Html Parse Plug-in
(parse-html)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Basic Indexing
Filter (index-basic)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - HTTP Framework
(lib-http)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Pass-through URL
Normalizer (urlnormalizer-pass)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Regex URL Filter
(urlfilter-regex)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Http Protocol
Plug-in (protocol-http)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Regex URL Normalizer
(urlnormalizer-regex)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Tika Parser Plug-in
(parse-tika)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - OPIC Scoring Plug-in
(scoring-opic)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - CyberNeko HTML
Parser (lib-nekohtml)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Anchor Indexing
Filter (index-anchor)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Regex URL Filter
Framework (lib-regex-filter)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Registered
Extension-Points:

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Nutch Protocol
(org.apache.nutch.protocol.Protocol)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Nutch Segment Merge
Filter (org.apache.nutch.segment.SegmentMergeFilter)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Nutch URL Filter
(org.apache.nutch.net.URLFilter)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Nutch Indexing
Filter (org.apache.nutch.indexer.IndexingFilter)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Nutch Content Parser
(org.apache.nutch.parse.Parser)

2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)

2011-11-03 16:51:54,345 WARN  net.URLNormalizers -
URLNormalizers:PluginRuntimeException when initializing url normalizer
plugin urlnormalizer-basic instance in getURLNormalizers function:
attempting to continue instantiating plugins

2011-11-03 16:51:54,360 WARN  net.URLNormalizers -
URLNormalizers:PluginRuntimeException when initializing url normalizer
plugin urlnormalizer-regex instance in getURLNormalizers function:
attempting to continue instantiating plugins

2011-11-03 16:51:54,360 WARN  net.URLNormalizers -
URLNormalizers:PluginRuntimeException when initializing url normalizer
plugin urlnormalizer-pass instance in getURLNormalizers function: attempting
to continue instantiating plugins

2011-11-03 16:51:54,360 WARN  mapred.LocalJobRunner - job_local_0001

java.lang.RuntimeException: Error in configuring object

 at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)

 at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)

 at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

 at 

Re: Running Issue about Nutch 1.3

2011-11-03 Thread Markus Jelsma
Hi

Please use the user@nutch mailing list for user-related questions. This is for 
development of Nutch itself.

Cheers

 Hello dear :
 
I have the following running information from
 hadoop.log when I configured Nutch 1.3 in Eclipse (Win 7), but I don't
 know how to resolve it ,Can you help me . I'm new to nutch , so forgive me
 for some mistakes of using wrong terminology!
 
 
 
 2011-11-03 16:51:53,300 WARN  crawl.Crawl - solrUrl is not set, indexing
 will be skipped...
 
 2011-11-03 16:51:53,502 INFO  crawl.Crawl - crawl started in: crawl
 
 2011-11-03 16:51:53,502 INFO  crawl.Crawl - rootUrlDir = urls
 
 2011-11-03 16:51:53,502 INFO  crawl.Crawl - threads = 4
 
 2011-11-03 16:51:53,502 INFO  crawl.Crawl - depth = 5
 
 2011-11-03 16:51:53,502 INFO  crawl.Crawl - solrUrl=null
 
 2011-11-03 16:51:53,502 INFO  crawl.Crawl - topN = 10
 
 2011-11-03 16:51:53,518 INFO  crawl.Injector - Injector: starting at
 2011-11-03 16:51:53
 
 2011-11-03 16:51:53,518 INFO  crawl.Injector - Injector: crawlDb:
 crawl/crawldb
 
 2011-11-03 16:51:53,518 INFO  crawl.Injector - Injector: urlDir: urls
 
 2011-11-03 16:51:53,534 INFO  crawl.Injector - Injector: Converting
 injected urls to crawl db entries.
 
 2011-11-03 16:51:53,658 WARN  mapred.JobClient - No job jar file set.  User
 classes may not be found. See JobConf(Class) or JobConf#setJar(String).
 
 2011-11-03 16:51:54,267 INFO  plugin.PluginRepository - Plugins: looking
 in: E:\IdealTimes\WorkSpace\Nutch1.3\plugin
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Plugin
 Auto-activation mode: [true]
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Registered Plugins:
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - the nutch core
 extension points (nutch-extensionpoints)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Basic URL
 Normalizer (urlnormalizer-basic)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Html Parse Plug-in
 (parse-html)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Basic Indexing
 Filter (index-basic)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - HTTP Framework
 (lib-http)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Pass-through URL
 Normalizer (urlnormalizer-pass)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Regex URL Filter
 (urlfilter-regex)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Http Protocol
 Plug-in (protocol-http)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Regex URL
 Normalizer (urlnormalizer-regex)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Tika Parser Plug-in
 (parse-tika)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - OPIC Scoring
 Plug-in (scoring-opic)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - CyberNeko HTML
 Parser (lib-nekohtml)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Anchor Indexing
 Filter (index-anchor)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Regex URL Filter
 Framework (lib-regex-filter)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Registered
 Extension-Points:
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Nutch URL
 Normalizer (org.apache.nutch.net.URLNormalizer)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Nutch Protocol
 (org.apache.nutch.protocol.Protocol)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Nutch Segment Merge
 Filter (org.apache.nutch.segment.SegmentMergeFilter)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Nutch URL Filter
 (org.apache.nutch.net.URLFilter)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Nutch Indexing
 Filter (org.apache.nutch.indexer.IndexingFilter)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - HTML Parse Filter
 (org.apache.nutch.parse.HtmlParseFilter)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Nutch Content
 Parser (org.apache.nutch.parse.Parser)
 
 2011-11-03 16:51:54,345 INFO  plugin.PluginRepository - Nutch Scoring
 (org.apache.nutch.scoring.ScoringFilter)
 
 2011-11-03 16:51:54,345 WARN  net.URLNormalizers -
 URLNormalizers:PluginRuntimeException when initializing url normalizer
 plugin urlnormalizer-basic instance in getURLNormalizers function:
 attempting to continue instantiating plugins
 
 2011-11-03 16:51:54,360 WARN  net.URLNormalizers -
 URLNormalizers:PluginRuntimeException when initializing url normalizer
 plugin urlnormalizer-regex instance in getURLNormalizers function:
 attempting to continue instantiating plugins
 
 2011-11-03 16:51:54,360 WARN  net.URLNormalizers -
 URLNormalizers:PluginRuntimeException when initializing url normalizer
 plugin urlnormalizer-pass instance in getURLNormalizers function:
 attempting to continue instantiating plugins
 
 2011-11-03 16:51:54,360 WARN  mapred.LocalJobRunner - job_local_0001
 
 java.lang.RuntimeException: Error in configuring object
 
  at