On Thursday 10 May 2012 14:35:03 Lewis John Mcgibbney wrote: > Hi Michael, > > As I'm also not using most recent stable Solr distribution (3.6.0), I > can only comment (maybe unwisely) that the most recent version of Solr > that Nutch supports is maybe 3.4.0 as this is the dependency we pull > with ivy. It also looks like Solr and Solrj are released in parallel > so maybe try upgrading your solrj dependency if you wish to use Solr > 3.6.0...
This should not be a version issue. We happily index from trunk or 1.4 to Solr versions > 3.0. There must be some schema thing or bad Solr request handler defined. > > If the above is correct, then this is why 3.1.0 works fine when you > roll back as I would imagine backwards compatibility is always of key > importance. > > I would be pleased to know that the above is not correct and that > Nutch is above to index to Solr 3.6.0, however if not then maybe we > should upgrade accordingly in trunk. > > Thanks > > Lewis > > On Thu, May 10, 2012 at 1:56 PM, Michael Erickson > > <[email protected]> wrote: > > On May 10, 2012, at 1:42 AM, Markus Jelsma wrote: > >> Hi, > >> > >> On Thu, 10 May 2012 09:10:04 +0300, Tolga <[email protected]> wrote: > >>> Hi, > >>> > >>> This will sound like a duplicate, but actually it differs from the > >>> other one. Please bear with me. Following > >>> http://wiki.apache.org/nutch/NutchTutorial, I first issued the command > >>> > >>> bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5 > >>> > >>> Then when I got the message > >>> > >>> Exception in thread "main" java.io.IOException: Job failed! > >>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) > >>> at > >>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDupli > >>> cates.java:373) at > >>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDupli > >>> cates.java:353) at org.apache.nutch.crawl.Crawl.run(Crawl.java:153) > >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >>> at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) > >> > >> Please include the relevant part of the log. This can be a known issue. > >> > >>> I issued the commands > >>> > >>> bin/nutch crawl urls -dir crawl -depth 3 -topN 5 > >>> > >>> and > >>> > >>> bin/nutch solrindex http://127.0.0.1:8983/solr/ crawldb -linkdb > >>> crawldb/linkdb crawldb/segments/* > >>> > >>> separately, after which I got no errors. When I browsed to > >>> http://localhost:8983/solr/admin and attempted a search, I got the > >>> error > >>> > >>> > >>> HTTP ERROR 400 > >>> > >>> Problem accessing /solr/select. Reason: > >>> > >>> undefined field text > >> > >> But this is a Solr thing, you have no field named text. Resolve this in > >> Solr or on the Solr mailing list.> > > I will say that I had similar issues last week when I tried the Nutch > > tutorial. I went to the #Solr IRC channel and got no response. The > > quick answer was that I had to go back to Solr version 3.1.0 for the > > instructions in the Nutch tutorial to work. > > > > The longer answer is that following the existing Nutch tutorial gave me > > two errors. > > > > 1) SolrDeleteDuplicates exception as mentioned by Tolga above. > > > > To fix this I: > > > > 1.a) Stop Solr. > > 1.b) Delete Solr index. > > 1.c) Copy the Nutch-provided schema.xml into the proper Solr directory > > (example/solr/conf/). 1.d) Replace Nutch's solr-solrj-xxx.jar with the > > appropriate version from Solr: ( solr/dist/apache-solr-solrj-xxx.jar --> > > nutch/runtime/local/lib/solr-solrj-xxx.jar ) 1.e) Restart Solr. > > > > The first two steps may only be necessary if you had Solr running already > > using the default schema that they provided as I did because I had done > > the Solr tutorial first. > > > > 2) The HTTP 400 Error "undefined field text" issue. > > > > This appears to be the same as: > > https://issues.apache.org/jira/browse/SOLR-3416. Log output from Solr > > output is here: http://pastebin.com/YWdPnXpv and the Nutch provided > > schema is here: http://pastebin.com/LQDDKC5B > > > > The only way I got this working was to move Solr from version 3.6.0 back > > to version 3.1.0. > > > > I'm *totally* new to Solr/Nutch, but I might suggest a versioning > > mismatch? > > > > > > Regards, > > --mike > > > > Michael Erickson > > [email protected] -- Markus Jelsma - CTO - Openindex

