Hi Guys,

I am running a job like below(Actually MtM2Fetch is a implementation of
Fetch job)

./m2mapred org.apache.m2.model.jobserver.backend.MtM2Fetch
/crawled/segments/20100806195139
/tmp/hadoop-hdpadmin/mapred/temp/generate-temp-1281095547871 1 1 10

But get below error:

10/08/06 20:38:59 INFO mapr
http://lucene.472066.n3.nabble.com/file/n1036428/MtM2Fetch.java
MtM2Fetch.java ed.JobClient: Running job: job_201008070125_0026
10/08/06 20:39:00 INFO mapred.JobClient:  map 0% reduce 0%
10/08/06 20:39:09 INFO mapred.JobClient: Task Id :
attempt_201008070125_0026_m_000000_0, Status : FAILED
java.lang.RuntimeException: Parse Plugins preferences could not be loaded.
        at
org.apache.nutch.parse.ParserFactory.<init>(ParserFactory.java:79)
        at org.apache.nutch.parse.ParseUtil.<init>(ParseUtil.java:50)
        at
org.apache.nutch.fetcher.Fetcher$FetcherThread.<init>(Fetcher.java:459)
        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:908)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

attempt_201008070125_0026_m_000000_0: ----------------this.parsePluginList =
null -----------
attempt_201008070125_0026_m_000000_0: -------------------come to else and
the plugin file is /opt/bigsheets-m5/conf/parse-plugins.xml------------
attempt_201008070125_0026_m_000000_0:
----------------/opt/bigsheets-m5/conf/parse-plugins.xml not found
-------------
attempt_201008070125_0026_m_000000_0: ---------------the inputSource is
org.xml.sax.inputsou...@324d324d --------------
attempt_201008070125_0026_m_000000_0: -------------------after new
Instance---------------
attempt_201008070125_0026_m_000000_0: -------------------after
factory.newDocumentBuilder --------------
attempt_201008070125_0026_m_000000_0: --------------------come to exception
phase -----------------
attempt_201008070125_0026_m_000000_0: --------------------come to Log Warn
phase,the file is null, null-----------------
10/08/06 20:39:09 INFO mapred.JobClient: Task Id :
attempt_201008070125_0026_m_000001_0, Status : FAILED
java.lang.RuntimeException: Parse Plugins preferences could not be loaded.
        at
org.apache.nutch.parse.ParserFactory.<init>(ParserFactory.java:79)
        at org.apache.nutch.parse.ParseUtil.<init>(ParseUtil.java:50)
        at
org.apache.nutch.fetcher.Fetcher$FetcherThread.<init>(Fetcher.java:459)
        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:908)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

I added some logs like below(that's why you found many messages in above
information) to find what's the root cause in ParsePluginList.java:

    InputStream ppInputStream = null;
    if (fParsePluginsFile != null) {
      System.out.printf("-------------------come to if
block---------------\n");
      URL parsePluginUrl = null;
      try {
        parsePluginUrl = new URL(fParsePluginsFile);
        ppInputStream = parsePluginUrl.openStream();
      } catch (Exception e) {
        if (LOG.isWarnEnabled()) {
          LOG.warn("Unable to load parse plugins file from URL " +
                   "[" + fParsePluginsFile + "]. Reason is [" + e + "]");
        }
        return pList;
      }
    } else {
    System.out.printf("-------------------come to else and the plugin file
is %s------------\n",conf.get
                           ("parse.plugin.file"));
      ppInputStream = conf.getConfResourceAsInputStream(
                          conf.get(PP_FILE_PROP));
         URL url= conf.getResource(conf.get("parse.plugin.file"));
         if (url == null) {
             System.out.printf("----------------%s not found
-------------\n",conf.get("parse.plugin.file"));
          } else {
             System.out.printf("------------------found resource %s
----------------\n", conf.get("parse.plugin.file"));
         }
    }
    inputSource = new InputSource(ppInputStream);
System.out.printf("---------------the inputSource is %s --------------\n",
inputSource.toString());


The reason is that we can't get URL from configuration() object using
getResource() function. Who can tell me why?  Actually
/opt/bigsheets-m5/conf/parse-plugins.xml  is right there!

Thank you very much!

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Parse-Plugins-preferences-could-not-be-loaded-error-when-fetch-using-Nutch-tp1036428p1036428.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to