Hi,

I would like to do that. But I still don't understand the concept of 'batch 
id'. Besides, is it the right direction to capture 'batch' argument in command 
line?

Thanks.









At 2012-12-19 22:07:23,"Lewis John Mcgibbney" <[email protected]> wrote:
>Hi,
>
>Currently the batchID is originally set by the GeneratorJob#run() method
>@line 169 [0], you will see that this can also be overridden by the
>generate.batch.id property in nutch-site.xml
>
>Currently if you look at line 117 in the crawl script [1] you will see that
>there is a TODO to capture the batchID programmatically.
>
>1) I would advise you to use this crawl script
>2) If you are able to create an issue on Jira, then submit a patch for the
>issue it would be excellent.
>
>Lewis
>
>[0]
>http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/crawl/GeneratorJob.java?view=markup
>[1]
>http://svn.apache.org/viewvc/nutch/branches/2.x/src/bin/crawl?view=markup
>
>On Fri, Dec 14, 2012 at 11:49 AM, 高睿 <[email protected]> wrote:
>
>> Hi,
>>
>> When I specify solr in command line, There will be an exception thrown.
>> Command line: urls -solr http://localhost:8080/solr/ -depth 1 -topN 3
>> I tried to add '-batch 3' parameter into command line, but it doesn't
>> help. I looked into the code, and found the parameter is ignored somewhere.
>> So, how do I fix this? Thanks.
>>
>> Skipping http://www.iguuu.com/thread-944-1-1.html; different batch id
>> (null)
>> Skipping http://www.iguuu.com/thread-987-1-1.html; different batch id
>> (null)
>> Exception in thread "main" java.lang.NullPointerException
>>     at java.util.Hashtable.put(Unknown Source)
>>     at java.util.Properties.setProperty(Unknown Source)
>>     at org.apache.hadoop.conf.Configuration.set(Configuration.java:438)
>>     at
>> org.apache.nutch.indexer.IndexerJob.createIndexJob(IndexerJob.java:128)
>>     at
>> org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:44)
>>     at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
>>     at org.apache.nutch.crawl.Crawler.run(Crawler.java:192)
>>     at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>     at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
>>
>> Regards,
>> Rui
>>
>
>
>
>-- 
>*Lewis*

Reply via email to