Re: HTTP error 400

Lewis John Mcgibbney Thu, 10 May 2012 06:35:34 -0700

Hi Michael,

As I'm also not using most recent stable Solr distribution (3.6.0), I
can only comment (maybe unwisely) that the most recent version of Solr
that Nutch supports is maybe 3.4.0 as this is the dependency we pull
with ivy. It also looks like Solr and Solrj are released in parallel
so maybe try upgrading your solrj dependency if you wish to use Solr
3.6.0...


If the above is correct, then this is why 3.1.0 works fine when you
roll back as I would imagine backwards compatibility is always of key
importance.

I would be pleased to know that the above is not correct and that
Nutch is above to index to Solr 3.6.0, however if not then maybe we
should upgrade accordingly in trunk.

Thanks

Lewis

On Thu, May 10, 2012 at 1:56 PM, Michael Erickson
<[email protected]> wrote:
>
> On May 10, 2012, at 1:42 AM, Markus Jelsma wrote:
>
>> Hi,
>>
>> On Thu, 10 May 2012 09:10:04 +0300, Tolga <[email protected]> wrote:
>>> Hi,
>>>
>>> This will sound like a duplicate, but actually it differs from the
>>> other one. Please bear with me. Following
>>> http://wiki.apache.org/nutch/NutchTutorial, I first issued the command
>>>
>>> bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5
>>>
>>> Then when I got the message
>>>
>>> Exception in thread "main" java.io.IOException: Job failed!
>>>    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
>>>    at
>>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)
>>>    at
>>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)
>>>    at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)
>>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>    at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
>>
>> Please include the relevant part of the log. This can be a known issue.
>>
>>>
>>> I issued the commands
>>>
>>> bin/nutch crawl urls -dir crawl -depth 3 -topN 5
>>>
>>> and
>>>
>>> bin/nutch solrindex http://127.0.0.1:8983/solr/ crawldb -linkdb
>>> crawldb/linkdb crawldb/segments/*
>>>
>>> separately, after which I got no errors. When I browsed to
>>> http://localhost:8983/solr/admin and attempted a search, I got the
>>> error
>>>
>>>
>>>   HTTP ERROR 400
>>>
>>> Problem accessing /solr/select. Reason:
>>>
>>>    undefined field text
>>
>> But this is a Solr thing, you have no field named text. Resolve this in Solr 
>> or on the Solr mailing list.
>
>
> I will say that I had similar issues last week when I tried the Nutch 
> tutorial.  I went to the #Solr IRC channel and got no response.  The quick 
> answer was that I had to go back to Solr version 3.1.0 for the instructions 
> in the Nutch tutorial to work.
>
> The longer answer is that following the existing Nutch tutorial gave me two 
> errors.
>
> 1) SolrDeleteDuplicates exception as mentioned by Tolga above.
>
> To fix this I:
>
> 1.a) Stop Solr.
> 1.b) Delete Solr index.
> 1.c) Copy the Nutch-provided schema.xml into the proper Solr directory 
> (example/solr/conf/).
> 1.d) Replace Nutch's solr-solrj-xxx.jar with the appropriate version from 
> Solr:
>       ( solr/dist/apache-solr-solrj-xxx.jar  --> 
> nutch/runtime/local/lib/solr-solrj-xxx.jar )
> 1.e) Restart Solr.
>
> The first two steps may only be necessary if you had Solr running already 
> using the default schema that they provided as I did because I had done the 
> Solr tutorial first.
>
> 2) The HTTP 400 Error "undefined field text" issue.
>
> This appears to be the same as: 
> https://issues.apache.org/jira/browse/SOLR-3416.  Log output from Solr output 
> is here: http://pastebin.com/YWdPnXpv and the Nutch provided schema is here: 
> http://pastebin.com/LQDDKC5B
>
> The only way I got this working was to move Solr from version 3.6.0 back to 
> version 3.1.0.
>
> I'm *totally* new to Solr/Nutch, but I might suggest a versioning mismatch?
>
>
> Regards,
> --mike
>
> Michael Erickson
> [email protected]
>
>



-- 
Lewis

Re: HTTP error 400

Reply via email to