Sort of figured out how to kickstart the crawl again. Basically did:
$s1=ls -d crawl/segments/* | tail -1 bin/nutch updatedb crawl/crawldb $1 bin/nutch generate crawl/crawldb crawl/segments $2=ls -d crawl/segments/* | tail -1 bin/nutch fetch $2 But unfortunately this is fetching the same urls as the previous fetch. :( From: [EMAIL PROTECTED] To: nutch-user@lucene.apache.org Subject: RE: Job failed! Date: Fri, 5 Sep 2008 09:45:00 +0000 Initially I just did a tail -10 so thought there were no errors, but there are a few actually. The pdf errors are my fault because I updated the pdf plugin with the latest PDFBox and FontBox jars from cvs on sf.net and missed out parse-pdf.jar on the rebuild. I'm not sure that's the reason why the job failed though. The log is 5MB so I can't really attach it all here but hopefully the last 200 lines gives an indication. By the way, is there a way to kickstart this crawl off again without crawling from the start again? tail -200 hadoop.log.2008-09-05 2008-09-05 03:41:22,360 WARN parse.ParserFactory - at java.lang.ClassLoader.loadClass(ClassLoader.java:306) 2008-09-05 03:41:22,360 WARN parse.ParserFactory - at java.lang.ClassLoader.loadClass(ClassLoader.java:251) 2008-09-05 03:41:22,360 WARN parse.ParserFactory - at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156) 2008-09-05 03:41:22,360 WARN parse.ParserFactory - ... 4 more 2008-09-05 03:41:22,360 WARN parse.ParserFactory - ParserFactory:PluginRuntimeException when initializing parser plugin parse-pdf instance in getParsers function: attempting to continue instantiating parsers 2008-09-05 03:41:22,360 WARN parse.ParseUtil - Unable to successfully parse content http://planetba.baplc.com/general/aptrix/aptcsops.nsf/AttachmentsByTitle/Premium+Service+Training+insert/$FILE/Premium+training.pdf of type application/pdf 2008-09-05 03:41:22,362 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/aptrix.nsf/Content/CTP+-+Travel+Plan+Objectives?OpenDocument 2008-09-05 03:41:23,616 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/aptsal.nsf/Content/ctcBA+Home%5CBusTools%5CC%5Ccomp+tckts%5Ccr+comp+tickets?OpenDocument 2008-09-05 03:41:24,745 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/aptrix.nsf/Content/Notes+7+-+93+Rooms?OpenDocument 2008-09-05 03:41:26,033 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/aptcsops.nsf/AttachmentsByTitle/SCCM+January+/$FILE/SCCMonthly+-+NovDec%2C+07+%2808+Jan%2C+08%29.pdf 2008-09-05 03:41:27,215 WARN parse.ParserFactory - org.apache.nutch.plugin.PluginRuntimeException: java.lang.ClassNotFoundException: org.apache.nutch.parse.pdf.PdfParser 2008-09-05 03:41:27,216 WARN parse.ParserFactory - at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166) 2008-09-05 03:41:27,216 WARN parse.ParserFactory - at org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:133) 2008-09-05 03:41:27,216 WARN parse.ParserFactory - at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:67) 2008-09-05 03:41:27,216 WARN parse.ParserFactory - at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:355) 2008-09-05 03:41:27,216 WARN parse.ParserFactory - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:178) 2008-09-05 03:41:27,216 WARN parse.ParserFactory - Caused by: java.lang.ClassNotFoundException: org.apache.nutch.parse.pdf.PdfParser 2008-09-05 03:41:27,216 WARN parse.ParserFactory - at java.net.URLClassLoader$1.run(URLClassLoader.java:200) 2008-09-05 03:41:27,216 WARN parse.ParserFactory - at java.security.AccessController.doPrivileged(Native Method) 2008-09-05 03:41:27,216 WARN parse.ParserFactory - at java.net.URLClassLoader.findClass(URLClassLoader.java:188) 2008-09-05 03:41:27,216 WARN parse.ParserFactory - at java.lang.ClassLoader.loadClass(ClassLoader.java:306) 2008-09-05 03:41:27,216 WARN parse.ParserFactory - at java.lang.ClassLoader.loadClass(ClassLoader.java:251) 2008-09-05 03:41:27,216 WARN parse.ParserFactory - at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156) 2008-09-05 03:41:27,216 WARN parse.ParserFactory - ... 4 more 2008-09-05 03:41:27,216 WARN parse.ParserFactory - ParserFactory:PluginRuntimeException when initializing parser plugin parse-pdf instance in getParsers function: attempting to continue instantiating parsers 2008-09-05 03:41:27,216 WARN parse.ParseUtil - Unable to successfully parse content http://planetba.baplc.com/general/aptrix/aptcsops.nsf/AttachmentsByTitle/SCCM+January+/$FILE/SCCMonthly+-+NovDec%2C+07+%2808+Jan%2C+08%29.pdf of type application/pdf 2008-09-05 03:41:27,216 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/bani.nsf/Content/XXXXLS%5FQ1Results%5F030807%5CXXXXLS%5FQ1Resultsvideo%5F030807?opendocument 2008-09-05 03:41:28,451 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/aptflt.nsf/AttachmentsByTitle/Flight+Ops+News+Aug+2008/$FILE/FLIGHT+OPS_AUGUST_08+intranet+live.pdf 2008-09-05 03:41:29,760 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/aptrix.nsf/Content/Virus+2+questions?OpenDocument 2008-09-05 03:41:30,789 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/aptrix.nsf/Content/Gender+Reass+the+process?OpenDocument 2008-09-05 03:41:32,066 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/aptcsops.nsf/AttachmentsByTitle/LGW+Crew+Responsibilities/$FILE/Crew+Responsibilities.doc 2008-09-05 03:41:33,390 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/aptflt.nsf/Content/Flight+Ops+Home%5CBusiness+Tools%5CFlight+Technical+Services%5CAircraft+Weights+%26+Evaluation%5CFleet+Weights+-+Aircraft+Weighing+Schedules?OpenDocument 2008-09-05 03:41:34,562 WARN parse.ParserFactory - org.apache.nutch.plugin.PluginRuntimeException: java.lang.ClassNotFoundException: org.apache.nutch.parse.pdf.PdfParser 2008-09-05 03:41:34,562 WARN parse.ParserFactory - at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166) 2008-09-05 03:41:34,562 WARN parse.ParserFactory - at org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:133) 2008-09-05 03:41:34,562 WARN parse.ParserFactory - at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:67) 2008-09-05 03:41:34,562 WARN parse.ParserFactory - at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:355) 2008-09-05 03:41:34,562 WARN parse.ParserFactory - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:178) 2008-09-05 03:41:34,562 WARN parse.ParserFactory - Caused by: java.lang.ClassNotFoundException: org.apache.nutch.parse.pdf.PdfParser 2008-09-05 03:41:34,562 WARN parse.ParserFactory - at java.net.URLClassLoader$1.run(URLClassLoader.java:200) 2008-09-05 03:41:34,562 WARN parse.ParserFactory - at java.security.AccessController.doPrivileged(Native Method) 2008-09-05 03:41:34,562 WARN parse.ParserFactory - at java.net.URLClassLoader.findClass(URLClassLoader.java:188) 2008-09-05 03:41:34,562 WARN parse.ParserFactory - at java.lang.ClassLoader.loadClass(ClassLoader.java:306) 2008-09-05 03:41:34,562 WARN parse.ParserFactory - at java.lang.ClassLoader.loadClass(ClassLoader.java:251) 2008-09-05 03:41:34,563 WARN parse.ParserFactory - at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156) 2008-09-05 03:41:34,563 WARN parse.ParserFactory - ... 4 more 2008-09-05 03:41:34,563 WARN parse.ParserFactory - ParserFactory:PluginRuntimeException when initializing parser plugin parse-pdf instance in getParsers function: attempting to continue instantiating parsers 2008-09-05 03:41:34,563 WARN parse.ParseUtil - Unable to successfully parse content http://planetba.baplc.com/general/aptrix/aptrix.nsf/AttachmentsByTitle/T5+Retail+-+T5+Ground+Level/$FILE/T5_Ground_Level.pdf of type application/pdf 2008-09-05 03:41:34,564 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/travel/stpg2.nsf/072561aa006322660725618c006b09a0/fc11f85e25deb736802574a30033c99e?OpenDocument 2008-09-05 03:41:35,926 WARN parse.ParserFactory - org.apache.nutch.plugin.PluginRuntimeException: java.lang.ClassNotFoundException: org.apache.nutch.parse.pdf.PdfParser 2008-09-05 03:41:35,926 WARN parse.ParserFactory - at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166) 2008-09-05 03:41:35,926 WARN parse.ParserFactory - at org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:133) 2008-09-05 03:41:35,926 WARN parse.ParserFactory - at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:67) 2008-09-05 03:41:35,926 WARN parse.ParserFactory - at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:355) 2008-09-05 03:41:35,926 WARN parse.ParserFactory - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:178) 2008-09-05 03:41:35,926 WARN parse.ParserFactory - Caused by: java.lang.ClassNotFoundException: org.apache.nutch.parse.pdf.PdfParser 2008-09-05 03:41:35,926 WARN parse.ParserFactory - at java.net.URLClassLoader$1.run(URLClassLoader.java:200) 2008-09-05 03:41:35,926 WARN parse.ParserFactory - at java.security.AccessController.doPrivileged(Native Method) 2008-09-05 03:41:35,926 WARN parse.ParserFactory - at java.net.URLClassLoader.findClass(URLClassLoader.java:188) 2008-09-05 03:41:35,926 WARN parse.ParserFactory - at java.lang.ClassLoader.loadClass(ClassLoader.java:306) 2008-09-05 03:41:35,926 WARN parse.ParserFactory - at java.lang.ClassLoader.loadClass(ClassLoader.java:251) 2008-09-05 03:41:35,926 WARN parse.ParserFactory - at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156) 2008-09-05 03:41:35,926 WARN parse.ParserFactory - ... 4 more 2008-09-05 03:41:35,926 WARN parse.ParserFactory - ParserFactory:PluginRuntimeException when initializing parser plugin parse-pdf instance in getParsers function: attempting to continue instantiating parsers 2008-09-05 03:41:35,926 WARN parse.ParseUtil - Unable to successfully parse content http://planetba.baplc.com/general/aptrix/aptrix.nsf/AttachmentsByTitle/Diversity+dignity+at+work+booklet/$FILE/Dignity+at+work+booklet.pdf of type application/pdf 2008-09-05 03:41:35,928 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/communications/wtps1.nsf/$lookup/1D94AD9A45B463638025730100263FDF 2008-09-05 03:41:36,988 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/aptcsops.nsf/AttachmentsByTitle/Barplus+Hints+and+Tips/$FILE/Barplus+Hints+and+Tips.pdf 2008-09-05 03:41:38,217 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home%5CDepartment+Information%5CEngineering+IT+Support+%26+Delivery+Homepage%5CEngineering+Solution+Group+%28ESG%29+Homepage%5CKey+user+Guides?OpenDocument 2008-09-05 03:41:41,143 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/aptrix.nsf/Content/Cultural+Awareness+Photo+Prize+Draw?OpenDocument 2008-09-05 03:41:42,278 WARN parse.ParserFactory - org.apache.nutch.plugin.PluginRuntimeException: java.lang.ClassNotFoundException: org.apache.nutch.parse.pdf.PdfParser 2008-09-05 03:41:42,279 WARN parse.ParserFactory - at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166) 2008-09-05 03:41:42,279 WARN parse.ParserFactory - at org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:133) 2008-09-05 03:41:42,279 WARN parse.ParserFactory - at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:67) 2008-09-05 03:41:42,279 WARN parse.ParserFactory - at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:355) 2008-09-05 03:41:42,279 WARN parse.ParserFactory - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:178) 2008-09-05 03:41:42,279 WARN parse.ParserFactory - Caused by: java.lang.ClassNotFoundException: org.apache.nutch.parse.pdf.PdfParser 2008-09-05 03:41:42,279 WARN parse.ParserFactory - at java.net.URLClassLoader$1.run(URLClassLoader.java:200) 2008-09-05 03:41:42,279 WARN parse.ParserFactory - at java.security.AccessController.doPrivileged(Native Method) 2008-09-05 03:41:42,279 WARN parse.ParserFactory - at java.net.URLClassLoader.findClass(URLClassLoader.java:188) 2008-09-05 03:41:42,279 WARN parse.ParserFactory - at java.lang.ClassLoader.loadClass(ClassLoader.java:306) 2008-09-05 03:41:42,279 WARN parse.ParserFactory - at java.lang.ClassLoader.loadClass(ClassLoader.java:251) 2008-09-05 03:41:42,279 WARN parse.ParserFactory - at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156) 2008-09-05 03:41:42,279 WARN parse.ParserFactory - ... 4 more 2008-09-05 03:41:42,279 WARN parse.ParserFactory - ParserFactory:PluginRuntimeException when initializing parser plugin parse-pdf instance in getParsers function: attempting to continue instantiating parsers 2008-09-05 03:41:42,279 WARN parse.ParseUtil - Unable to successfully parse content http://planetba.baplc.com/general/aptrix/aptflt.nsf/AttachmentsByTitle/Flight+Ops+News+Aug+2008/$FILE/FLIGHT+OPS_AUGUST_08+intranet+live.pdf of type application/pdf 2008-09-05 03:41:42,313 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/aptsal.nsf/Content/ctcBA+Home%5CBusTools%5CB%5Cbah%5CPromos+pckge%5CFlrda+08+EBO+WTP+upgde?OpenDocument 2008-09-05 03:41:42,342 INFO fetcher.Fetcher - fetching http://planetba.baplc.com/general/aptrix/aptrix.nsf/Content/PMA+EG904+timescales?OpenDocument 2008-09-05 03:41:52,279 WARN parse.ParserFactory - org.apache.nutch.plugin.PluginRuntimeException: java.lang.ClassNotFoundException: org.apache.nutch.parse.pdf.PdfParser 2008-09-05 03:41:52,279 WARN parse.ParserFactory - at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166) 2008-09-05 03:41:52,279 WARN parse.ParserFactory - at org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:133) 2008-09-05 03:41:52,279 WARN parse.ParserFactory - at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:67) 2008-09-05 03:41:52,279 WARN parse.ParserFactory - at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:355) 2008-09-05 03:41:52,279 WARN parse.ParserFactory - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:178) 2008-09-05 03:41:52,279 WARN parse.ParserFactory - Caused by: java.lang.ClassNotFoundException: org.apache.nutch.parse.pdf.PdfParser 2008-09-05 03:41:52,279 WARN parse.ParserFactory - at java.net.URLClassLoader$1.run(URLClassLoader.java:200) 2008-09-05 03:41:52,279 WARN parse.ParserFactory - at java.security.AccessController.doPrivileged(Native Method) 2008-09-05 03:41:52,279 WARN parse.ParserFactory - at java.net.URLClassLoader.findClass(URLClassLoader.java:188) 2008-09-05 03:41:52,279 WARN parse.ParserFactory - at java.lang.ClassLoader.loadClass(ClassLoader.java:306) 2008-09-05 03:41:52,279 WARN parse.ParserFactory - at java.lang.ClassLoader.loadClass(ClassLoader.java:251) 2008-09-05 03:41:52,279 WARN parse.ParserFactory - at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156) 2008-09-05 03:41:52,279 WARN parse.ParserFactory - ... 4 more 2008-09-05 03:41:52,279 WARN parse.ParserFactory - ParserFactory:PluginRuntimeException when initializing parser plugin parse-pdf instance in getParsers function: attempting to continue instantiating parsers 2008-09-05 03:41:52,279 WARN parse.ParseUtil - Unable to successfully parse content http://planetba.baplc.com/general/aptrix/aptcsops.nsf/AttachmentsByTitle/Barplus+Hints+and+Tips/$FILE/Barplus+Hints+and+Tips.pdf of type application/pdf 2008-09-05 03:41:55,927 WARN mapred.LocalJobRunner - job_local_21 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_local_21/job_local_21_map_0000/output/file.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:313) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:982) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:157) 2008-09-05 09:32:46,906 INFO searcher.NutchBean - opening indexes in crawl/indexes 2008-09-05 09:32:47,002 INFO plugin.PluginRepository - Plugins: looking in: /ok/appl/nutch-2008-09-04_04-01-27/plugins 2008-09-05 09:32:47,305 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2008-09-05 09:32:47,305 INFO plugin.PluginRepository - Registered Plugins: 2008-09-05 09:32:47,305 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2008-09-05 09:32:47,305 INFO plugin.PluginRepository - MSPowerPoint Parse Plug-in (parse-mspowerpoint) 2008-09-05 09:32:47,305 INFO plugin.PluginRepository - Site Query Filter (query-site) 2008-09-05 09:32:47,305 INFO plugin.PluginRepository - Http / Https Protocol Plug-in (protocol-httpclient) 2008-09-05 09:32:47,305 INFO plugin.PluginRepository - MSWord Parse Plug-in (parse-msword) 2008-09-05 09:32:47,305 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Pdf Parse Plug-in (parse-pdf) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Basic Summarizer Plug-in (summary-basic) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - MSExcel Parse Plug-in (parse-msexcel) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Text Parse Plug-in (parse-text) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Jakarta POI - Java API To Access Microsoft Format Files (lib-jakarta-poi) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Basic Query Filter (query-basic) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - HTTP Framework (lib-http) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - URL Query Filter (query-url) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Parse MS Documents Framework (lib-parsems) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Zip Parse Plug-in (parse-zip) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic) 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Registered Extension-Points: 2008-09-05 09:32:47,306 INFO plugin.PluginRepository - Nutch Summarizer (org.apache.nutch.searcher.Summarizer) 2008-09-05 09:32:47,307 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2008-09-05 09:32:47,307 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2008-09-05 09:32:47,307 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2008-09-05 09:32:47,307 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2008-09-05 09:32:47,307 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2008-09-05 09:32:47,307 INFO plugin.PluginRepository - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2008-09-05 09:32:47,307 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2008-09-05 09:32:47,307 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser) 2008-09-05 09:32:47,307 INFO plugin.PluginRepository - Ontology Model Loader (org.apache.nutch.ontology.Ontology) 2008-09-05 09:32:47,307 INFO plugin.PluginRepository - Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) 2008-09-05 09:32:47,307 INFO plugin.PluginRepository - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) 2008-09-05 09:32:47,342 INFO searcher.NutchBean - opening segments in crawl/segments 2008-09-05 09:32:47,368 INFO searcher.SummarizerFactory - Using the first summarizer extension found: Basic Summarizer 2008-09-05 09:32:47,371 INFO searcher.NutchBean - opening linkdb in crawl/linkdb 2008-09-05 09:32:52,746 INFO searcher.NutchBean - opening indexes in crawl/indexes 2008-09-05 09:32:52,791 INFO plugin.PluginRepository - Plugins: looking in: /ok/appl/nutch-2008-09-04_04-01-27/plugins 2008-09-05 09:32:52,999 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2008-09-05 09:32:52,999 INFO plugin.PluginRepository - Registered Plugins: 2008-09-05 09:32:52,999 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) > Subject: Re: Job failed! > From: [EMAIL PROTECTED] > To: nutch-user@lucene.apache.org > Date: Fri, 5 Sep 2008 17:28:47 +0800 > > Could you show the whole hdaoop.log? > 在 2008-09-05五的 08:46 +0000,Edward Quick写道: > > Hi, > > > > I ran a crawl last night > > > > bin/nutch crawl urls -dir crawl -depth 10 > > > > which collected 10612 pages, and then bailed out with the following error: > > > > Exception in thread "main" java.io.IOException: Job failed! > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062) > > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:552) > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:122) > > > > I checked there was enough space on the box, and there don't appear to be > > any errors in hadoop.log or the crawl output, so I'm stuck on what caused > > this. > > > > Also, is there a way to pick up the crawl from where it stopped rather than > > having to rerun it all over again? > > > > Thanks for any help. > > > > Ed. > > > > > > > > _________________________________________________________________ > > Discover Bird's Eye View now with Multimap from Live Search > > http://clk.atdmt.com/UKM/go/111354026/direct/01/ > > Get Hotmail on your mobile from Vodafone Try it Now _________________________________________________________________ Win New York holidays with Kellogg’s & Live Search http://clk.atdmt.com/UKM/go/111354033/direct/01/