Hi,

Here's more detail:
In Crawler.java
    for (int i = 0; i < args.length; i++) {
      if ("-threads".equals(args[i])) {
        threads = Integer.parseInt(args[i+1]);
        i++;
      } else if ("-depth".equals(args[i])) {
        depth = Integer.parseInt(args[i+1]);
        i++;
      } else if ("-topN".equals(args[i])) {
          topN = Integer.parseInt(args[i+1]);
          i++;
      } else if ("-solr".equals(args[i])) {
        solrUrl = StringUtils.lowerCase(args[i + 1]);
        i++;
      } else if ("-numTasks".equals(args[i])) {
        numTasks = Integer.parseInt(args[i+1]);
        i++;
      } else if ("-continue".equals(args[i])) {
        // skip
      } else if (args[i] != null) {
        seedDir = args[i];
      }
    }
    Map<String,Object> argMap = ToolUtil.toArgMap(
        Nutch.ARG_THREADS, threads,
        Nutch.ARG_DEPTH, depth,
        Nutch.ARG_TOPN, topN,
        Nutch.ARG_SOLR, solrUrl,
        Nutch.ARG_SEEDDIR, seedDir,
        Nutch.ARG_NUMTASKS, numTasks);
    run(argMap);

So, argMap doesn't contain 'batch' argument. But in SolrIndexJob.java, it try 
to get such argument value. Obviously, it's null.

  @Override
  public Map<String,Object> run(Map<String,Object> args) throws Exception {
    String solrUrl = (String)args.get(Nutch.ARG_SOLR);
    String batchId = (String)args.get(Nutch.ARG_BATCH);
    NutchIndexWriterFactory.addClassToConf(getConf(), SolrWriter.class);
    getConf().set(SolrConstants.SERVER_URL, solrUrl);

    currentJob = createIndexJob(getConf(), "solr-index", batchId);

Then, in IndexJob.java, there is a NullPointerException thrown:

  protected Job createIndexJob(Configuration conf, String jobName, String 
batchId)
  throws IOException, ClassNotFoundException {
    conf.set(GeneratorJob.BATCH_ID, batchId);
    Job job = new NutchJob(conf, jobName);



At 2012-12-14 19:49:12,"高睿" <[email protected]> wrote:

Hi,

When I specify solr in command line, There will be an exception thrown.
Command line: urls -solr http://localhost:8080/solr/ -depth 1 -topN 3
I tried to add '-batch 3' parameter into command line, but it doesn't help. I 
looked into the code, and found the parameter is ignored somewhere.
So, how do I fix this? Thanks.

Skipping http://www.iguuu.com/thread-944-1-1.html; different batch id (null)
Skipping http://www.iguuu.com/thread-987-1-1.html; different batch id (null)
Exception in thread "main" java.lang.NullPointerException
    at java.util.Hashtable.put(Unknown Source)
    at java.util.Properties.setProperty(Unknown Source)
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:438)
    at org.apache.nutch.indexer.IndexerJob.createIndexJob(IndexerJob.java:128)
    at org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:44)
    at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:192)
    at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)

Regards,
Rui



Reply via email to