Hi Guys, I am running a job like below(Actually MtM2Fetch is a implementation of Fetch job)
./m2mapred org.apache.m2.model.jobserver.backend.MtM2Fetch /crawled/segments/20100806195139 /tmp/hadoop-hdpadmin/mapred/temp/generate-temp-1281095547871 1 1 10 But get below error: 10/08/06 20:38:59 INFO mapr http://lucene.472066.n3.nabble.com/file/n1036428/MtM2Fetch.java MtM2Fetch.java ed.JobClient: Running job: job_201008070125_0026 10/08/06 20:39:00 INFO mapred.JobClient: map 0% reduce 0% 10/08/06 20:39:09 INFO mapred.JobClient: Task Id : attempt_201008070125_0026_m_000000_0, Status : FAILED java.lang.RuntimeException: Parse Plugins preferences could not be loaded. at org.apache.nutch.parse.ParserFactory.<init>(ParserFactory.java:79) at org.apache.nutch.parse.ParseUtil.<init>(ParseUtil.java:50) at org.apache.nutch.fetcher.Fetcher$FetcherThread.<init>(Fetcher.java:459) at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:908) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) attempt_201008070125_0026_m_000000_0: ----------------this.parsePluginList = null ----------- attempt_201008070125_0026_m_000000_0: -------------------come to else and the plugin file is /opt/bigsheets-m5/conf/parse-plugins.xml------------ attempt_201008070125_0026_m_000000_0: ----------------/opt/bigsheets-m5/conf/parse-plugins.xml not found ------------- attempt_201008070125_0026_m_000000_0: ---------------the inputSource is org.xml.sax.inputsou...@324d324d -------------- attempt_201008070125_0026_m_000000_0: -------------------after new Instance--------------- attempt_201008070125_0026_m_000000_0: -------------------after factory.newDocumentBuilder -------------- attempt_201008070125_0026_m_000000_0: --------------------come to exception phase ----------------- attempt_201008070125_0026_m_000000_0: --------------------come to Log Warn phase,the file is null, null----------------- 10/08/06 20:39:09 INFO mapred.JobClient: Task Id : attempt_201008070125_0026_m_000001_0, Status : FAILED java.lang.RuntimeException: Parse Plugins preferences could not be loaded. at org.apache.nutch.parse.ParserFactory.<init>(ParserFactory.java:79) at org.apache.nutch.parse.ParseUtil.<init>(ParseUtil.java:50) at org.apache.nutch.fetcher.Fetcher$FetcherThread.<init>(Fetcher.java:459) at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:908) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) I added some logs like below(that's why you found many messages in above information) to find what's the root cause in ParsePluginList.java: InputStream ppInputStream = null; if (fParsePluginsFile != null) { System.out.printf("-------------------come to if block---------------\n"); URL parsePluginUrl = null; try { parsePluginUrl = new URL(fParsePluginsFile); ppInputStream = parsePluginUrl.openStream(); } catch (Exception e) { if (LOG.isWarnEnabled()) { LOG.warn("Unable to load parse plugins file from URL " + "[" + fParsePluginsFile + "]. Reason is [" + e + "]"); } return pList; } } else { System.out.printf("-------------------come to else and the plugin file is %s------------\n",conf.get ("parse.plugin.file")); ppInputStream = conf.getConfResourceAsInputStream( conf.get(PP_FILE_PROP)); URL url= conf.getResource(conf.get("parse.plugin.file")); if (url == null) { System.out.printf("----------------%s not found -------------\n",conf.get("parse.plugin.file")); } else { System.out.printf("------------------found resource %s ----------------\n", conf.get("parse.plugin.file")); } } inputSource = new InputSource(ppInputStream); System.out.printf("---------------the inputSource is %s --------------\n", inputSource.toString()); The reason is that we can't get URL from configuration() object using getResource() function. Who can tell me why? Actually /opt/bigsheets-m5/conf/parse-plugins.xml is right there! Thank you very much! -- View this message in context: http://lucene.472066.n3.nabble.com/Parse-Plugins-preferences-could-not-be-loaded-error-when-fetch-using-Nutch-tp1036428p1036428.html Sent from the Nutch - User mailing list archive at Nabble.com.

