Hello, I'm trying to get a nutch job to work on a hadoop cluster but with no
success. When it comes to nutch solrindex I get the following exception. I'm
using the nutch 1-3, checked out from the trunk yesterday (indeed for
debugging).The Solr instance in use is 3.1, and wierdly I ran a test script
once and it worked and then tried it again (wiping out the index + crawl
data) with my own nutch instance and this came out. Tried again the nutch
debug instance and this still comes out.
11/05/26 23:41:46 INFO solr.SolrIndexer: SolrIndexer: starting at 2011-05-26
23:41:46
11/05/26 23:41:46 INFO indexer.IndexerMapReduce: IndexerMapReduce: crawldb:
gabriele/crawl/crawldb
11/05/26 23:41:46 INFO indexer.IndexerMapReduce: IndexerMapReduce: linkdb:
gabriele/crawl/linkdb
11/05/26 23:41:46 INFO indexer.IndexerMapReduce: IndexerMapReduces: adding
segment:
hdfs://loocia-c1/user/gkahlout/gabriele/crawl/segments/20110526234054
11/05/26 23:41:46 INFO mapred.FileInputFormat: Total input paths to process
: 6
11/05/26 23:41:47 INFO mapred.JobClient: Running job: job_201103141146_0703
11/05/26 23:41:48 INFO mapred.JobClient: map 0% reduce 0%
11/05/26 23:41:56 INFO mapred.JobClient: map 66% reduce 0%
11/05/26 23:41:58 INFO mapred.JobClient: map 100% reduce 0%
11/05/26 23:42:04 INFO mapred.JobClient: map 100% reduce 33%
11/05/26 23:42:06 INFO mapred.JobClient: Task Id :
attempt_201103141146_0703_r_000000_0, Status : FAILED
java.lang.RuntimeException: Invalid version (expected 2, but 1) or the data
in not in 'javabin' format
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:82)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
11/05/26 23:42:07 INFO mapred.JobClient: map 100% reduce 0%
11/05/26 23:42:19 INFO mapred.JobClient: map 100% reduce 27%
11/05/26 23:42:21 INFO mapred.JobClient: Task Id :
attempt_201103141146_0703_r_000000_1, Status : FAILED
java.lang.RuntimeException: Invalid version (expected 2, but 1) or the data
in not in 'javabin' format
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:82)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
11/05/26 23:42:22 INFO mapred.JobClient: map 100% reduce 0%
11/05/26 23:42:31 INFO mapred.JobClient: map 100% reduce 27%
11/05/26 23:42:33 INFO mapred.JobClient: Task Id :
attempt_201103141146_0703_r_000000_2, Status : FAILED
java.lang.RuntimeException: Invalid version (expected 2, but 1) or the data
in not in 'javabin' format
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:82)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
--
Regards,
K. Gabriele
--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).
If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).