I am sending my Hadoop file and I apllied also patch559V0.5

at the time of fetching I am getting this messages
---------------------------------------------------------
Fetcher: starting
Fetcher: segment: crawl/segments/20080103125023
Fetcher: threads: 10
fetching http://www.w3schools.com/
http.proxy.host = netmon.iitb.ac.in
http.proxy.port = 80
http.timeout = 100000
http.content.limit = 65536
http.agent = digi/Nutch-0.9 (digvijay; http://www.google.com;
[EMAIL PROTECTED])
protocol.plugin.check.blocking = true
protocol.plugin.check.robots = true
fetcher.server.delay = 5000
http.max.delays = 100
Configured Client
fetch of http://www.w3schools.com/ failed with: Http code=407, url=
http://www.w3schools.com/
Fetcher: done

----------------------------------------------------------------------------
2008-01-03 12:50:04,275 INFO  crawl.Injector - Injector: starting
2008-01-03 12:50:04,347 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb
2008-01-03 12:50:04,347 INFO  crawl.Injector - Injector: urlDir: urls
2008-01-03 12:50:04,895 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
2008-01-03 12:50:11,140 INFO  plugin.PluginRepository - Plugins: looking in: /home/digvijay/Nutch/nutch-0.9/plugins
2008-01-03 12:50:12,171 INFO  plugin.PluginRepository - Plugin Auto-activation mode: [true]
2008-01-03 12:50:12,171 INFO  plugin.PluginRepository - Registered Plugins:
2008-01-03 12:50:12,171 INFO  plugin.PluginRepository - 	Pdf Parse Plug-in (parse-pdf)
2008-01-03 12:50:12,171 INFO  plugin.PluginRepository - 	Http / Https Protocol Plug-in (protocol-httpclient)
2008-01-03 12:50:12,171 INFO  plugin.PluginRepository - 	Jakarta POI - Java API To Access Microsoft Format Files (lib-jakarta-poi)
2008-01-03 12:50:12,171 INFO  plugin.PluginRepository - 	HTTP Framework (lib-http)
2008-01-03 12:50:12,171 INFO  plugin.PluginRepository - 	Regex URL Filter (urlfilter-regex)
2008-01-03 12:50:12,171 INFO  plugin.PluginRepository - 	MSWord Parse Plug-in (parse-msword)
2008-01-03 12:50:12,171 INFO  plugin.PluginRepository - 	XML Libraries (lib-xml)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	MSExcel Parse Plug-in (parse-msexcel)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	OPIC Scoring Plug-in (scoring-opic)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	Zip Parse Plug-in (parse-zip)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	URL Query Filter (query-url)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	Parse MS Documents Framework (lib-parsems)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	Regex URL Filter Framework (lib-regex-filter)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	the nutch core extension points (nutch-extensionpoints)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	MSPowerPoint Parse Plug-in (parse-mspowerpoint)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	Basic Query Filter (query-basic)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	Basic URL Normalizer (urlnormalizer-basic)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	Html Parse Plug-in (parse-html)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	RSS Parse Plug-in (parse-rss)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	Basic Indexing Filter (index-basic)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	Basic Summarizer Plug-in (summary-basic)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	Site Query Filter (query-site)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	Text Parse Plug-in (parse-text)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	Pass-through URL Normalizer (urlnormalizer-pass)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	Regex URL Normalizer (urlnormalizer-regex)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	CyberNeko HTML Parser (lib-nekohtml)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	OpenOffice/OpenDocument Parse Plug-in (parse-oo)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	Log4j (lib-log4j)
2008-01-03 12:50:12,172 INFO  plugin.PluginRepository - 	JavaScript Parser (parse-js)
2008-01-03 12:50:12,173 INFO  plugin.PluginRepository - 	SWF Parse Plug-in (parse-swf)
2008-01-03 12:50:12,173 INFO  plugin.PluginRepository - Registered Extension-Points:
2008-01-03 12:50:12,173 INFO  plugin.PluginRepository - 	Nutch Summarizer (org.apache.nutch.searcher.Summarizer)
2008-01-03 12:50:12,173 INFO  plugin.PluginRepository - 	Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
2008-01-03 12:50:12,173 INFO  plugin.PluginRepository - 	Nutch Protocol (org.apache.nutch.protocol.Protocol)
2008-01-03 12:50:12,173 INFO  plugin.PluginRepository - 	Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
2008-01-03 12:50:12,173 INFO  plugin.PluginRepository - 	Nutch URL Filter (org.apache.nutch.net.URLFilter)
2008-01-03 12:50:12,173 INFO  plugin.PluginRepository - 	Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
2008-01-03 12:50:12,174 INFO  plugin.PluginRepository - 	Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2008-01-03 12:50:12,174 INFO  plugin.PluginRepository - 	HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
2008-01-03 12:50:12,174 INFO  plugin.PluginRepository - 	Nutch Content Parser (org.apache.nutch.parse.Parser)
2008-01-03 12:50:12,174 INFO  plugin.PluginRepository - 	Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
2008-01-03 12:50:12,174 INFO  plugin.PluginRepository - 	Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
2008-01-03 12:50:12,174 INFO  plugin.PluginRepository - 	Ontology Model Loader (org.apache.nutch.ontology.Ontology)
2008-01-03 12:50:12,381 WARN  regex.RegexURLNormalizer - can't find rules for scope 'inject', using default
2008-01-03 12:50:13,048 INFO  crawl.Injector - Injector: Merging injected urls into crawl db.
2008-01-03 12:50:19,684 INFO  crawl.Injector - Injector: done
2008-01-03 12:50:23,951 INFO  crawl.Generator - Generator: Selecting best-scoring urls due for fetch.
2008-01-03 12:50:23,952 INFO  crawl.Generator - Generator: starting
2008-01-03 12:50:23,952 INFO  crawl.Generator - Generator: segment: crawl/segments/20080103125023
2008-01-03 12:50:23,952 INFO  crawl.Generator - Generator: filtering: true
2008-01-03 12:50:23,999 INFO  crawl.Generator - Generator: jobtracker is 'local', generating exactly one partition.
2008-01-03 12:50:25,101 INFO  plugin.PluginRepository - Plugins: looking in: /home/digvijay/Nutch/nutch-0.9/plugins
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - Plugin Auto-activation mode: [true]
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - Registered Plugins:
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	Pdf Parse Plug-in (parse-pdf)
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	Http / Https Protocol Plug-in (protocol-httpclient)
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	Jakarta POI - Java API To Access Microsoft Format Files (lib-jakarta-poi)
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	HTTP Framework (lib-http)
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	Regex URL Filter (urlfilter-regex)
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	MSWord Parse Plug-in (parse-msword)
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	XML Libraries (lib-xml)
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	MSExcel Parse Plug-in (parse-msexcel)
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	OPIC Scoring Plug-in (scoring-opic)
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	Zip Parse Plug-in (parse-zip)
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	URL Query Filter (query-url)
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	Parse MS Documents Framework (lib-parsems)
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	Regex URL Filter Framework (lib-regex-filter)
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	the nutch core extension points (nutch-extensionpoints)
2008-01-03 12:50:25,251 INFO  plugin.PluginRepository - 	MSPowerPoint Parse Plug-in (parse-mspowerpoint)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	Basic Query Filter (query-basic)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	Basic URL Normalizer (urlnormalizer-basic)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	Html Parse Plug-in (parse-html)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	RSS Parse Plug-in (parse-rss)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	Basic Indexing Filter (index-basic)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	Basic Summarizer Plug-in (summary-basic)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	Site Query Filter (query-site)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	Text Parse Plug-in (parse-text)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	Pass-through URL Normalizer (urlnormalizer-pass)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	Regex URL Normalizer (urlnormalizer-regex)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	CyberNeko HTML Parser (lib-nekohtml)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	OpenOffice/OpenDocument Parse Plug-in (parse-oo)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	Log4j (lib-log4j)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	JavaScript Parser (parse-js)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	SWF Parse Plug-in (parse-swf)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - Registered Extension-Points:
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	Nutch Summarizer (org.apache.nutch.searcher.Summarizer)
2008-01-03 12:50:25,252 INFO  plugin.PluginRepository - 	Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
2008-01-03 12:50:25,253 INFO  plugin.PluginRepository - 	Nutch Protocol (org.apache.nutch.protocol.Protocol)
2008-01-03 12:50:25,253 INFO  plugin.PluginRepository - 	Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
2008-01-03 12:50:25,253 INFO  plugin.PluginRepository - 	Nutch URL Filter (org.apache.nutch.net.URLFilter)
2008-01-03 12:50:25,253 INFO  plugin.PluginRepository - 	Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
2008-01-03 12:50:25,253 INFO  plugin.PluginRepository - 	Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2008-01-03 12:50:25,253 INFO  plugin.PluginRepository - 	HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
2008-01-03 12:50:25,253 INFO  plugin.PluginRepository - 	Nutch Content Parser (org.apache.nutch.parse.Parser)
2008-01-03 12:50:25,253 INFO  plugin.PluginRepository - 	Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
2008-01-03 12:50:25,253 INFO  plugin.PluginRepository - 	Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
2008-01-03 12:50:25,253 INFO  plugin.PluginRepository - 	Ontology Model Loader (org.apache.nutch.ontology.Ontology)
2008-01-03 12:50:25,354 WARN  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default
2008-01-03 12:50:25,409 INFO  plugin.PluginRepository - Plugins: looking in: /home/digvijay/Nutch/nutch-0.9/plugins
2008-01-03 12:50:25,544 INFO  plugin.PluginRepository - Plugin Auto-activation mode: [true]
2008-01-03 12:50:25,544 INFO  plugin.PluginRepository - Registered Plugins:
2008-01-03 12:50:25,544 INFO  plugin.PluginRepository - 	Pdf Parse Plug-in (parse-pdf)
2008-01-03 12:50:25,544 INFO  plugin.PluginRepository - 	Http / Https Protocol Plug-in (protocol-httpclient)
2008-01-03 12:50:25,544 INFO  plugin.PluginRepository - 	Jakarta POI - Java API To Access Microsoft Format Files (lib-jakarta-poi)
2008-01-03 12:50:25,544 INFO  plugin.PluginRepository - 	HTTP Framework (lib-http)
2008-01-03 12:50:25,544 INFO  plugin.PluginRepository - 	Regex URL Filter (urlfilter-regex)
2008-01-03 12:50:25,544 INFO  plugin.PluginRepository - 	MSWord Parse Plug-in (parse-msword)
2008-01-03 12:50:25,544 INFO  plugin.PluginRepository - 	XML Libraries (lib-xml)
2008-01-03 12:50:25,544 INFO  plugin.PluginRepository - 	MSExcel Parse Plug-in (parse-msexcel)
2008-01-03 12:50:25,544 INFO  plugin.PluginRepository - 	OPIC Scoring Plug-in (scoring-opic)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	Zip Parse Plug-in (parse-zip)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	URL Query Filter (query-url)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	Parse MS Documents Framework (lib-parsems)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	Regex URL Filter Framework (lib-regex-filter)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	the nutch core extension points (nutch-extensionpoints)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	MSPowerPoint Parse Plug-in (parse-mspowerpoint)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	Basic Query Filter (query-basic)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	Basic URL Normalizer (urlnormalizer-basic)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	Html Parse Plug-in (parse-html)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	RSS Parse Plug-in (parse-rss)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	Basic Indexing Filter (index-basic)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	Basic Summarizer Plug-in (summary-basic)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	Site Query Filter (query-site)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	Text Parse Plug-in (parse-text)
2008-01-03 12:50:25,545 INFO  plugin.PluginRepository - 	Pass-through URL Normalizer (urlnormalizer-pass)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	Regex URL Normalizer (urlnormalizer-regex)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	CyberNeko HTML Parser (lib-nekohtml)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	OpenOffice/OpenDocument Parse Plug-in (parse-oo)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	Log4j (lib-log4j)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	JavaScript Parser (parse-js)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	SWF Parse Plug-in (parse-swf)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - Registered Extension-Points:
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	Nutch Summarizer (org.apache.nutch.searcher.Summarizer)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	Nutch Protocol (org.apache.nutch.protocol.Protocol)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	Nutch URL Filter (org.apache.nutch.net.URLFilter)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	Nutch Content Parser (org.apache.nutch.parse.Parser)
2008-01-03 12:50:25,546 INFO  plugin.PluginRepository - 	Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
2008-01-03 12:50:25,547 INFO  plugin.PluginRepository - 	Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
2008-01-03 12:50:25,547 INFO  plugin.PluginRepository - 	Ontology Model Loader (org.apache.nutch.ontology.Ontology)
2008-01-03 12:50:26,043 INFO  crawl.Generator - Generator: Partitioning selected urls by host, for politeness.
2008-01-03 12:50:26,327 INFO  plugin.PluginRepository - Plugins: looking in: /home/digvijay/Nutch/nutch-0.9/plugins
2008-01-03 12:50:26,428 INFO  plugin.PluginRepository - Plugin Auto-activation mode: [true]
2008-01-03 12:50:26,428 INFO  plugin.PluginRepository - Registered Plugins:
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	Pdf Parse Plug-in (parse-pdf)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	Http / Https Protocol Plug-in (protocol-httpclient)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	Jakarta POI - Java API To Access Microsoft Format Files (lib-jakarta-poi)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	HTTP Framework (lib-http)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	Regex URL Filter (urlfilter-regex)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	MSWord Parse Plug-in (parse-msword)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	XML Libraries (lib-xml)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	MSExcel Parse Plug-in (parse-msexcel)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	OPIC Scoring Plug-in (scoring-opic)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	Zip Parse Plug-in (parse-zip)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	URL Query Filter (query-url)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	Parse MS Documents Framework (lib-parsems)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	Regex URL Filter Framework (lib-regex-filter)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	the nutch core extension points (nutch-extensionpoints)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	MSPowerPoint Parse Plug-in (parse-mspowerpoint)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	Basic Query Filter (query-basic)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	Basic URL Normalizer (urlnormalizer-basic)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	Html Parse Plug-in (parse-html)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	RSS Parse Plug-in (parse-rss)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	Basic Indexing Filter (index-basic)
2008-01-03 12:50:26,429 INFO  plugin.PluginRepository - 	Basic Summarizer Plug-in (summary-basic)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Site Query Filter (query-site)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Text Parse Plug-in (parse-text)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Pass-through URL Normalizer (urlnormalizer-pass)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Regex URL Normalizer (urlnormalizer-regex)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	CyberNeko HTML Parser (lib-nekohtml)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	OpenOffice/OpenDocument Parse Plug-in (parse-oo)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Log4j (lib-log4j)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	JavaScript Parser (parse-js)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	SWF Parse Plug-in (parse-swf)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - Registered Extension-Points:
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Nutch Summarizer (org.apache.nutch.searcher.Summarizer)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Nutch Protocol (org.apache.nutch.protocol.Protocol)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Nutch URL Filter (org.apache.nutch.net.URLFilter)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Nutch Content Parser (org.apache.nutch.parse.Parser)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
2008-01-03 12:50:26,430 INFO  plugin.PluginRepository - 	Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
2008-01-03 12:50:26,431 INFO  plugin.PluginRepository - 	Ontology Model Loader (org.apache.nutch.ontology.Ontology)
2008-01-03 12:50:26,447 WARN  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default
2008-01-03 12:50:27,312 INFO  crawl.Generator - Generator: done.
2008-01-03 12:50:38,553 INFO  fetcher.Fetcher - Fetcher: starting
2008-01-03 12:50:38,554 INFO  fetcher.Fetcher - Fetcher: segment: crawl/segments/20080103125023
2008-01-03 12:50:39,097 INFO  fetcher.Fetcher - Fetcher: threads: 10
2008-01-03 12:50:39,111 INFO  plugin.PluginRepository - Plugins: looking in: /home/digvijay/Nutch/nutch-0.9/plugins
2008-01-03 12:50:39,253 INFO  plugin.PluginRepository - Plugin Auto-activation mode: [true]
2008-01-03 12:50:39,253 INFO  plugin.PluginRepository - Registered Plugins:
2008-01-03 12:50:39,253 INFO  plugin.PluginRepository - 	Pdf Parse Plug-in (parse-pdf)
2008-01-03 12:50:39,253 INFO  plugin.PluginRepository - 	Http / Https Protocol Plug-in (protocol-httpclient)
2008-01-03 12:50:39,253 INFO  plugin.PluginRepository - 	Jakarta POI - Java API To Access Microsoft Format Files (lib-jakarta-poi)
2008-01-03 12:50:39,253 INFO  plugin.PluginRepository - 	HTTP Framework (lib-http)
2008-01-03 12:50:39,254 INFO  plugin.PluginRepository - 	Regex URL Filter (urlfilter-regex)
2008-01-03 12:50:39,254 INFO  plugin.PluginRepository - 	MSWord Parse Plug-in (parse-msword)
2008-01-03 12:50:39,254 INFO  plugin.PluginRepository - 	XML Libraries (lib-xml)
2008-01-03 12:50:39,254 INFO  plugin.PluginRepository - 	MSExcel Parse Plug-in (parse-msexcel)
2008-01-03 12:50:39,254 INFO  plugin.PluginRepository - 	OPIC Scoring Plug-in (scoring-opic)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	Zip Parse Plug-in (parse-zip)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	URL Query Filter (query-url)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	Parse MS Documents Framework (lib-parsems)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	Regex URL Filter Framework (lib-regex-filter)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	the nutch core extension points (nutch-extensionpoints)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	MSPowerPoint Parse Plug-in (parse-mspowerpoint)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	Basic Query Filter (query-basic)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	Basic URL Normalizer (urlnormalizer-basic)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	Html Parse Plug-in (parse-html)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	RSS Parse Plug-in (parse-rss)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	Basic Indexing Filter (index-basic)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	Basic Summarizer Plug-in (summary-basic)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	Site Query Filter (query-site)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	Text Parse Plug-in (parse-text)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	Pass-through URL Normalizer (urlnormalizer-pass)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	Regex URL Normalizer (urlnormalizer-regex)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	CyberNeko HTML Parser (lib-nekohtml)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	OpenOffice/OpenDocument Parse Plug-in (parse-oo)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	Log4j (lib-log4j)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	JavaScript Parser (parse-js)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - 	SWF Parse Plug-in (parse-swf)
2008-01-03 12:50:39,255 INFO  plugin.PluginRepository - Registered Extension-Points:
2008-01-03 12:50:39,256 INFO  plugin.PluginRepository - 	Nutch Summarizer (org.apache.nutch.searcher.Summarizer)
2008-01-03 12:50:39,256 INFO  plugin.PluginRepository - 	Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
2008-01-03 12:50:39,256 INFO  plugin.PluginRepository - 	Nutch Protocol (org.apache.nutch.protocol.Protocol)
2008-01-03 12:50:39,256 INFO  plugin.PluginRepository - 	Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
2008-01-03 12:50:39,256 INFO  plugin.PluginRepository - 	Nutch URL Filter (org.apache.nutch.net.URLFilter)
2008-01-03 12:50:39,256 INFO  plugin.PluginRepository - 	Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
2008-01-03 12:50:39,256 INFO  plugin.PluginRepository - 	Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2008-01-03 12:50:39,256 INFO  plugin.PluginRepository - 	HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
2008-01-03 12:50:39,256 INFO  plugin.PluginRepository - 	Nutch Content Parser (org.apache.nutch.parse.Parser)
2008-01-03 12:50:39,256 INFO  plugin.PluginRepository - 	Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
2008-01-03 12:50:39,256 INFO  plugin.PluginRepository - 	Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
2008-01-03 12:50:39,256 INFO  plugin.PluginRepository - 	Ontology Model Loader (org.apache.nutch.ontology.Ontology)
2008-01-03 12:50:39,334 INFO  fetcher.Fetcher - fetching http://www.w3schools.com/
2008-01-03 12:50:39,498 FATAL api.RobotRulesParser - Agent we advertise (digi) not listed first in 'http.robots.agents' property!
2008-01-03 12:50:39,498 INFO  httpclient.Http - http.proxy.host = netmon.iitb.ac.in
2008-01-03 12:50:39,498 INFO  httpclient.Http - http.proxy.port = 80
2008-01-03 12:50:39,498 INFO  httpclient.Http - http.timeout = 100000
2008-01-03 12:50:39,498 INFO  httpclient.Http - http.content.limit = 65536
2008-01-03 12:50:39,498 INFO  httpclient.Http - http.agent = digi/Nutch-0.9 (digvijay; http://www.google.com; [EMAIL PROTECTED])
2008-01-03 12:50:39,499 INFO  httpclient.Http - protocol.plugin.check.blocking = true
2008-01-03 12:50:39,499 INFO  httpclient.Http - protocol.plugin.check.robots = true
2008-01-03 12:50:39,499 INFO  httpclient.Http - fetcher.server.delay = 5000
2008-01-03 12:50:39,499 INFO  httpclient.Http - http.max.delays = 100
2008-01-03 12:50:39,506 INFO  httpclient.Http - Configured Client
2008-01-03 12:50:47,682 INFO  auth.AuthChallengeProcessor - basic authentication scheme selected
2008-01-03 12:50:47,684 INFO  httpclient.HttpMethodDirector - No credentials available for BASIC 'Squid proxy-caching web server'@netmon.iitb.ac.in:80
2008-01-03 12:50:57,359 INFO  auth.AuthChallengeProcessor - basic authentication scheme selected
2008-01-03 12:50:57,359 INFO  httpclient.HttpMethodDirector - No credentials available for BASIC 'Squid proxy-caching web server'@netmon.iitb.ac.in:80
2008-01-03 12:50:57,407 INFO  fetcher.Fetcher - fetch of http://www.w3schools.com/ failed with: Http code=407, url=http://www.w3schools.com/
2008-01-03 12:50:58,508 INFO  plugin.PluginRepository - Plugins: looking in: /home/digvijay/Nutch/nutch-0.9/plugins
2008-01-03 12:50:58,639 INFO  plugin.PluginRepository - Plugin Auto-activation mode: [true]
2008-01-03 12:50:58,639 INFO  plugin.PluginRepository - Registered Plugins:
2008-01-03 12:50:58,639 INFO  plugin.PluginRepository - 	Pdf Parse Plug-in (parse-pdf)
2008-01-03 12:50:58,639 INFO  plugin.PluginRepository - 	Http / Https Protocol Plug-in (protocol-httpclient)
2008-01-03 12:50:58,639 INFO  plugin.PluginRepository - 	Jakarta POI - Java API To Access Microsoft Format Files (lib-jakarta-poi)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	HTTP Framework (lib-http)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	Regex URL Filter (urlfilter-regex)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	MSWord Parse Plug-in (parse-msword)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	XML Libraries (lib-xml)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	MSExcel Parse Plug-in (parse-msexcel)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	OPIC Scoring Plug-in (scoring-opic)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	Zip Parse Plug-in (parse-zip)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	URL Query Filter (query-url)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	Parse MS Documents Framework (lib-parsems)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	Regex URL Filter Framework (lib-regex-filter)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	the nutch core extension points (nutch-extensionpoints)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	MSPowerPoint Parse Plug-in (parse-mspowerpoint)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	Basic Query Filter (query-basic)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	Basic URL Normalizer (urlnormalizer-basic)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	Html Parse Plug-in (parse-html)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	RSS Parse Plug-in (parse-rss)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	Basic Indexing Filter (index-basic)
2008-01-03 12:50:58,640 INFO  plugin.PluginRepository - 	Basic Summarizer Plug-in (summary-basic)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Site Query Filter (query-site)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Text Parse Plug-in (parse-text)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Pass-through URL Normalizer (urlnormalizer-pass)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Regex URL Normalizer (urlnormalizer-regex)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	CyberNeko HTML Parser (lib-nekohtml)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	OpenOffice/OpenDocument Parse Plug-in (parse-oo)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Log4j (lib-log4j)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	JavaScript Parser (parse-js)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	SWF Parse Plug-in (parse-swf)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - Registered Extension-Points:
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Nutch Summarizer (org.apache.nutch.searcher.Summarizer)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Nutch Protocol (org.apache.nutch.protocol.Protocol)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Nutch URL Filter (org.apache.nutch.net.URLFilter)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Nutch Content Parser (org.apache.nutch.parse.Parser)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
2008-01-03 12:50:58,641 INFO  plugin.PluginRepository - 	Ontology Model Loader (org.apache.nutch.ontology.Ontology)
2008-01-03 12:50:59,120 INFO  fetcher.Fetcher - Fetcher: done

Reply via email to