Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-24 Thread Tom Chiverton

Oh, I misunderstood then.

Tom


On 21/10/16 19:52, lewis john mcgibbney wrote:

No it's definitively not worth removing the schema.xml as it works
perfectly. Please have a peek at the previous commands I posted. They will
set you up with what you need.




Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-20 Thread Tom Chiverton
I ended up building my own 'nutch' configset - the managed-schema is the 
only tricky thing to get right.


Is it worth removing the schema.xml from Nutch (as it doesn't work with 
Solr 5 or 6) and replacing with a suggested config set to use instead ?


Tom


On 20/10/16 08:35, lewis john mcgibbney wrote:

Hi Tom,
This looks like it has been frustrating for you so I've provided a walk
through of how I can set up a core using current Nutch 2.X schema.xml

On Mon, Oct 17, 2016 at 9:27 AM, <user-digest-h...@nutch.apache.org> wrote:


From: Tom Chiverton <t...@extravision.com>
To: user@nutch.apache.org
Cc:
Date: Mon, 17 Oct 2016 09:55:53 +0100
Subject: Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:
I tried that, and it still gives

ERROR: Error CREATEing SolrCore 'nutch': Unable to create core [nutch]
Caused by: enablePositionIncrements is not a valid option as of Lucene 5.0

Tom



lmcgibbn@LMC-056430 /usr/local/solr-6.2.1 $ cp
/usr/local/nutch2/conf/schema.xml example/files/conf/
lmcgibbn@LMC-056430 /usr/local/solr-6.2.1 $ ./bin/solr start
Waiting up to 30 seconds to see Solr running on port 8983 [/]
Started Solr server on port 8983 (pid=49222). Happy searching!

lmcgibbn@LMC-056430 /usr/local/solr-6.2.1 $ ./bin/solr create -c nutch -d
/usr/local/solr-6.2.1/example/files/conf -p 8983

Copying configuration to new core instance directory:
/usr/local/solr-6.2.1/server/solr/nutch

Creating new core 'nutch' using command:
http://localhost:8983/solr/admin/cores?action=CREATE=nutch=nutch

{
   "responseHeader":{
 "status":0,
 "QTime":1657},
   "core":"nutch"}

I can now run my crawls on Nutch 2.X. Can you please replicate the above
then tell me where and if anything goes wrong?
Thanks
Lewis


__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__




Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-20 Thread lewis john mcgibbney
Hi Tom,
This looks like it has been frustrating for you so I've provided a walk
through of how I can set up a core using current Nutch 2.X schema.xml

On Mon, Oct 17, 2016 at 9:27 AM, <user-digest-h...@nutch.apache.org> wrote:

>
> From: Tom Chiverton <t...@extravision.com>
> To: user@nutch.apache.org
> Cc:
> Date: Mon, 17 Oct 2016 09:55:53 +0100
> Subject: Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:
> I tried that, and it still gives
>
> ERROR: Error CREATEing SolrCore 'nutch': Unable to create core [nutch]
> Caused by: enablePositionIncrements is not a valid option as of Lucene 5.0
>
> Tom
>
>
lmcgibbn@LMC-056430 /usr/local/solr-6.2.1 $ cp
/usr/local/nutch2/conf/schema.xml example/files/conf/
lmcgibbn@LMC-056430 /usr/local/solr-6.2.1 $ ./bin/solr start
Waiting up to 30 seconds to see Solr running on port 8983 [/]
Started Solr server on port 8983 (pid=49222). Happy searching!

lmcgibbn@LMC-056430 /usr/local/solr-6.2.1 $ ./bin/solr create -c nutch -d
/usr/local/solr-6.2.1/example/files/conf -p 8983

Copying configuration to new core instance directory:
/usr/local/solr-6.2.1/server/solr/nutch

Creating new core 'nutch' using command:
http://localhost:8983/solr/admin/cores?action=CREATE=nutch=nutch

{
  "responseHeader":{
"status":0,
"QTime":1657},
  "core":"nutch"}

I can now run my crawls on Nutch 2.X. Can you please replicate the above
then tell me where and if anything goes wrong?
Thanks
Lewis


Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-14 Thread Tom Chiverton
OK, so where is a known good one for Solr 5 ? Maybe Felix could post his 
and we could put that in the Nutch distro ?


Tom


On 14/10/16 16:39, Markus Jelsma wrote:

Yes, that file is probably incredible old and never maintained. You can safely 
remove those options from the schema.
M.

  
  
-Original message-

From:Tom Chiverton <t...@extravision.com>
Sent: Friday 14th October 2016 17:34
To: user@nutch.apache.org
Subject: Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

Hmm, that makes a certain sense.

But then I seem to be getting errors like

ERROR: Error CREATEing SolrCore 'nutch': Unable to create core [nutch]
Caused by: enablePositionIncrements is not a valid option as of Lucene 5.0

Like the Nutch schema is for a much older Solr ?

What version were you running ?

On 14/10/16 16:17, Felix von Zadow wrote:

I had the same problem a while ago; I accidentally forgot to supply the schema 
when creating a core and had the digest/string*s* problem. Here's some more 
explanation of what I did, I hope I remember correctly:

I am using (just like Markus suggested) the schema.xml from nutch which sets

-Ursprüngliche Nachricht-
Von: Markus Jelsma [mailto:markus.jel...@openindex.io]
Gesendet: Freitag, 14. Oktober 2016 17:05
An: user@nutch.apache.org
Betreff: RE: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

Yes, current Solr comes with a schemaless mode, which can cause errors in
some cases. It must be configured to use classic schema mode, then you provide
it with the schema.xml file you can find in Nutch' conf directory.

M.



-Original message-

From:Tom Chiverton <t...@extravision.com>
Sent: Friday 14th October 2016 16:58
To: user@nutch.apache.org
Subject: Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

I don't understand what you mean here. I am not a Solr expert, though
I've used it a bit in the past, though not with Nutch.

Is there a schema I should be feeding it ?

Tom


On 14/10/16 15:50, Markus Jelsma wrote:

Solr supports schemaless mode, which may be your case. Perhaps it made

your digest field multi valued. I'd suggest to use Solr's classic schema 
factory,
and a fixed schema.

__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__




__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__





RE: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-14 Thread Markus Jelsma
You can probably use the one from Nutch 1.12, it certainly got more updates.
M.

 
 
-Original message-
> From:Tom Chiverton <t...@extravision.com>
> Sent: Friday 14th October 2016 17:44
> To: user@nutch.apache.org
> Subject: Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:
> 
> OK, so where is a known good one for Solr 5 ? Maybe Felix could post his 
> and we could put that in the Nutch distro ?
> 
> Tom
> 
> 
> On 14/10/16 16:39, Markus Jelsma wrote:
> > Yes, that file is probably incredible old and never maintained. You can 
> > safely remove those options from the schema.
> > M.
> >
> >   
> >   
> > -Original message-
> >> From:Tom Chiverton <t...@extravision.com>
> >> Sent: Friday 14th October 2016 17:34
> >> To: user@nutch.apache.org
> >> Subject: Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:
> >>
> >> Hmm, that makes a certain sense.
> >>
> >> But then I seem to be getting errors like
> >>
> >> ERROR: Error CREATEing SolrCore 'nutch': Unable to create core [nutch]
> >> Caused by: enablePositionIncrements is not a valid option as of Lucene 5.0
> >>
> >> Like the Nutch schema is for a much older Solr ?
> >>
> >> What version were you running ?
> >>
> >> On 14/10/16 16:17, Felix von Zadow wrote:
> >>> I had the same problem a while ago; I accidentally forgot to supply the 
> >>> schema when creating a core and had the digest/string*s* problem. Here's 
> >>> some more explanation of what I did, I hope I remember correctly:
> >>>
> >>> I am using (just like Markus suggested) the schema.xml from nutch which 
> >>> sets
> >>>  >>> and
> >>>  >>>
> >>> I duplicated the data_driven_schema_configs/conf/ configset and replaced 
> >>> the managed-schema file with Nutch's schema.xml so I have:
> >>> [...]/solr/configsets/my_config/managed_schema
> >>>
> >>> Core is created like so:
> >>> solr create -c corename -d my_config
> >>>
> >>>
> >>> Hope that helps,
> >>> Felix
> >>>
> >>>> -Ursprüngliche Nachricht-
> >>>> Von: Markus Jelsma [mailto:markus.jel...@openindex.io]
> >>>> Gesendet: Freitag, 14. Oktober 2016 17:05
> >>>> An: user@nutch.apache.org
> >>>> Betreff: RE: Nutch 2, Solr 5 - solrdedup causes ClassCastException:
> >>>>
> >>>> Yes, current Solr comes with a schemaless mode, which can cause errors in
> >>>> some cases. It must be configured to use classic schema mode, then you 
> >>>> provide
> >>>> it with the schema.xml file you can find in Nutch' conf directory.
> >>>>
> >>>> M.
> >>>>
> >>>>
> >>>>
> >>>> -Original message-
> >>>>> From:Tom Chiverton <t...@extravision.com>
> >>>>> Sent: Friday 14th October 2016 16:58
> >>>>> To: user@nutch.apache.org
> >>>>> Subject: Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:
> >>>>>
> >>>>> I don't understand what you mean here. I am not a Solr expert, though
> >>>>> I've used it a bit in the past, though not with Nutch.
> >>>>>
> >>>>> Is there a schema I should be feeding it ?
> >>>>>
> >>>>> Tom
> >>>>>
> >>>>>
> >>>>> On 14/10/16 15:50, Markus Jelsma wrote:
> >>>>>> Solr supports schemaless mode, which may be your case. Perhaps it made
> >>>> your digest field multi valued. I'd suggest to use Solr's classic schema 
> >>>> factory,
> >>>> and a fixed schema.
> >>> __
> >>> This email has been scanned by the Symantec Email Security.cloud service.
> >>> For more information please visit http://www.symanteccloud.com
> >>> __
> >>>
> >>
> > __
> > This email has been scanned by the Symantec Email Security.cloud service.
> > For more information please visit http://www.symanteccloud.com
> > __
> >
> 
> 


RE: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-14 Thread Markus Jelsma
Yes, that file is probably incredible old and never maintained. You can safely 
remove those options from the schema.
M.

 
 
-Original message-
> From:Tom Chiverton <t...@extravision.com>
> Sent: Friday 14th October 2016 17:34
> To: user@nutch.apache.org
> Subject: Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:
> 
> Hmm, that makes a certain sense.
> 
> But then I seem to be getting errors like
> 
> ERROR: Error CREATEing SolrCore 'nutch': Unable to create core [nutch] 
> Caused by: enablePositionIncrements is not a valid option as of Lucene 5.0
> 
> Like the Nutch schema is for a much older Solr ?
> 
> What version were you running ?
> 
> On 14/10/16 16:17, Felix von Zadow wrote:
> > I had the same problem a while ago; I accidentally forgot to supply the 
> > schema when creating a core and had the digest/string*s* problem. Here's 
> > some more explanation of what I did, I hope I remember correctly:
> >
> > I am using (just like Markus suggested) the schema.xml from nutch which sets
> >  > and
> >  >
> > I duplicated the data_driven_schema_configs/conf/ configset and replaced 
> > the managed-schema file with Nutch's schema.xml so I have:
> > [...]/solr/configsets/my_config/managed_schema
> >
> > Core is created like so:
> > solr create -c corename -d my_config
> >
> >
> > Hope that helps,
> > Felix
> >
> >> -Ursprüngliche Nachricht-
> >> Von: Markus Jelsma [mailto:markus.jel...@openindex.io]
> >> Gesendet: Freitag, 14. Oktober 2016 17:05
> >> An: user@nutch.apache.org
> >> Betreff: RE: Nutch 2, Solr 5 - solrdedup causes ClassCastException:
> >>
> >> Yes, current Solr comes with a schemaless mode, which can cause errors in
> >> some cases. It must be configured to use classic schema mode, then you 
> >> provide
> >> it with the schema.xml file you can find in Nutch' conf directory.
> >>
> >> M.
> >>
> >>
> >>
> >> -Original message-
> >>> From:Tom Chiverton <t...@extravision.com>
> >>> Sent: Friday 14th October 2016 16:58
> >>> To: user@nutch.apache.org
> >>> Subject: Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:
> >>>
> >>> I don't understand what you mean here. I am not a Solr expert, though
> >>> I've used it a bit in the past, though not with Nutch.
> >>>
> >>> Is there a schema I should be feeding it ?
> >>>
> >>> Tom
> >>>
> >>>
> >>> On 14/10/16 15:50, Markus Jelsma wrote:
> >>>> Solr supports schemaless mode, which may be your case. Perhaps it made
> >> your digest field multi valued. I'd suggest to use Solr's classic schema 
> >> factory,
> >> and a fixed schema.
> >>>
> > __
> > This email has been scanned by the Symantec Email Security.cloud service.
> > For more information please visit http://www.symanteccloud.com
> > __
> >
> 
> 


AW: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-14 Thread Felix von Zadow
> What version were you running ?

nutch 2.3.1 and solr 5.5.0


Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-14 Thread Tom Chiverton

Hmm, that makes a certain sense.

But then I seem to be getting errors like

ERROR: Error CREATEing SolrCore 'nutch': Unable to create core [nutch] 
Caused by: enablePositionIncrements is not a valid option as of Lucene 5.0


Like the Nutch schema is for a much older Solr ?

What version were you running ?

On 14/10/16 16:17, Felix von Zadow wrote:

I had the same problem a while ago; I accidentally forgot to supply the schema 
when creating a core and had the digest/string*s* problem. Here's some more 
explanation of what I did, I hope I remember correctly:

I am using (just like Markus suggested) the schema.xml from nutch which sets

-Ursprüngliche Nachricht-
Von: Markus Jelsma [mailto:markus.jel...@openindex.io]
Gesendet: Freitag, 14. Oktober 2016 17:05
An: user@nutch.apache.org
Betreff: RE: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

Yes, current Solr comes with a schemaless mode, which can cause errors in
some cases. It must be configured to use classic schema mode, then you provide
it with the schema.xml file you can find in Nutch' conf directory.

M.



-Original message-

From:Tom Chiverton <t...@extravision.com>
Sent: Friday 14th October 2016 16:58
To: user@nutch.apache.org
Subject: Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

I don't understand what you mean here. I am not a Solr expert, though
I've used it a bit in the past, though not with Nutch.

Is there a schema I should be feeding it ?

Tom


On 14/10/16 15:50, Markus Jelsma wrote:

Solr supports schemaless mode, which may be your case. Perhaps it made

your digest field multi valued. I'd suggest to use Solr's classic schema 
factory,
and a fixed schema.



__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__





Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-14 Thread Tom Chiverton
I mean, of course, the schema browser shows them as strings, but the 
query interface now shows them as arrays e.g.


|"digest": [ "0f1e6553ede0daec9c9449a3a80e3a80" ],|

Tom

On 14/10/16 16:18, Tom Chiverton wrote:


But still, Solr schema browser is showing the digest field as string 
(as before) and my documents are listed (via the solr query web 
interface) as having string digests too !






Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-14 Thread Tom Chiverton
Not sure what was going on, so I deleted the core, and the underlying 
folders under solr/server/nutch, bounced the solr service, and the 
schema browser in the Solr interface shows now schema as expected.


If it put a single document it (i.e. a single URL in the seed list, then 
run inject, generate,fetch,parse,update and solrindex) then all is well. 
The schema browser in the Solr interface  is showing the digest field as 
string.


If I then run "bin/crawl ..." it adds some more documents (as expected) 
but ultimiatly dies with the ClassCastException. Like I have a bad 
document in the index ?


But still, Solr schema browser is showing the digest field as string (as 
before) and my documents are listed (via the solr query web interface) 
as having string digests too !


Tom


On 14/10/16 15:57, Tom Chiverton wrote:
I don't understand what you mean here. I am not a Solr expert, though 
I've used it a bit in the past, though not with Nutch.


Is there a schema I should be feeding it ?

Tom


On 14/10/16 15:50, Markus Jelsma wrote:
Solr supports schemaless mode, which may be your case. Perhaps it 
made your digest field multi valued. I'd suggest to use Solr's 
classic schema factory, and a fixed schema.




__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__




Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-14 Thread Felix von Zadow

I had the same problem a while ago; I accidentally forgot to supply the schema 
when creating a core and had the digest/string*s* problem. Here's some more 
explanation of what I did, I hope I remember correctly:

I am using (just like Markus suggested) the schema.xml from nutch which sets
 -Ursprüngliche Nachricht-
> Von: Markus Jelsma [mailto:markus.jel...@openindex.io]
> Gesendet: Freitag, 14. Oktober 2016 17:05
> An: user@nutch.apache.org
> Betreff: RE: Nutch 2, Solr 5 - solrdedup causes ClassCastException:
> 
> Yes, current Solr comes with a schemaless mode, which can cause errors in
> some cases. It must be configured to use classic schema mode, then you provide
> it with the schema.xml file you can find in Nutch' conf directory.
> 
> M.
> 
> 
> 
> -Original message-
> > From:Tom Chiverton <t...@extravision.com>
> > Sent: Friday 14th October 2016 16:58
> > To: user@nutch.apache.org
> > Subject: Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:
> >
> > I don't understand what you mean here. I am not a Solr expert, though
> > I've used it a bit in the past, though not with Nutch.
> >
> > Is there a schema I should be feeding it ?
> >
> > Tom
> >
> >
> > On 14/10/16 15:50, Markus Jelsma wrote:
> > > Solr supports schemaless mode, which may be your case. Perhaps it made
> your digest field multi valued. I'd suggest to use Solr's classic schema 
> factory,
> and a fixed schema.
> >
> >


RE: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-14 Thread Markus Jelsma
Yes, current Solr comes with a schemaless mode, which can cause errors in some 
cases. It must be configured to use classic schema mode, then you provide it 
with the schema.xml file you can find in Nutch' conf directory.

M.

 
 
-Original message-
> From:Tom Chiverton <t...@extravision.com>
> Sent: Friday 14th October 2016 16:58
> To: user@nutch.apache.org
> Subject: Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:
> 
> I don't understand what you mean here. I am not a Solr expert, though 
> I've used it a bit in the past, though not with Nutch.
> 
> Is there a schema I should be feeding it ?
> 
> Tom
> 
> 
> On 14/10/16 15:50, Markus Jelsma wrote:
> > Solr supports schemaless mode, which may be your case. Perhaps it made your 
> > digest field multi valued. I'd suggest to use Solr's classic schema 
> > factory, and a fixed schema.
> 
> 


Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-14 Thread Tom Chiverton
I don't understand what you mean here. I am not a Solr expert, though 
I've used it a bit in the past, though not with Nutch.


Is there a schema I should be feeding it ?

Tom


On 14/10/16 15:50, Markus Jelsma wrote:

Solr supports schemaless mode, which may be your case. Perhaps it made your 
digest field multi valued. I'd suggest to use Solr's classic schema factory, 
and a fixed schema.




RE: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-14 Thread Markus Jelsma
Solr supports schemaless mode, which may be your case. Perhaps it made your 
digest field multi valued. I'd suggest to use Solr's classic schema factory, 
and a fixed schema.

m.

 
 
-Original message-
> From:Tom Chiverton <t...@extravision.com>
> Sent: Friday 14th October 2016 16:44
> To: user@nutch.apache.org
> Subject: Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:
> 
> Where would this be configured ? I'm creating the solr core by just doing
> 
> "solr/bin/solr create_core -c nutch"
> 
> should I be feeding it a special schema file somehow ?
> 
> Tom
> 
> 
> On 14/10/16 14:39, Markus Jelsma wrote:
> > Your digest field is configured as multi valued, which should not be the 
> > case.
> 
> 


Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-14 Thread Tom Chiverton

Where would this be configured ? I'm creating the solr core by just doing

"solr/bin/solr create_core -c nutch"

should I be feeding it a special schema file somehow ?

Tom


On 14/10/16 14:39, Markus Jelsma wrote:

Your digest field is configured as multi valued, which should not be the case.




Nutch 2, Solr 5 - solrdedup causes ClassCastException:

2016-10-14 Thread Tom Chiverton
I've tried using both Solr 6 and 5 with the latest Nutch 2, and with 
both I am getting an error from Nutch's bin/crawl.


mnt/nutch/nutch/runtime/local/bin/nutch solrdedup -D 
mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D 
mapred.reduce.tasks.speculative.execution=false -D 
mapred.map.tasks.speculative.execution=false -D 
mapred.compress.map.output=true http://localhost:8983/solr/nutch
Exception in thread "main" java.lang.RuntimeException: job failed: 
name=apache-nutch-2.3.1.jar, jobid=job_local2123017879_0001
at 
org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:383)
at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.run(SolrDeleteDuplicates.java:393)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.main(SolrDeleteDuplicates.java:403)

Error running:
  /mnt/nutch/nutch/runtime/local/bin/nutch solrdedup -D 
mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D 
mapred.reduce.tasks.speculative.execution=false -D 
mapred.map.tasks.speculative.execution=false -D 
mapred.compress.map.output=true http://localhost:8983/solr/nutch

Failed with exit value 1.

hadoop.log says

java.lang.Exception: java.lang.ClassCastException: java.util.ArrayList 
cannot be cast to java.lang.String
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be 
cast to java.lang.String
at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrRecordReader.nextKeyValue(SolrDeleteDuplicates.java:233)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Which appears to be related to the digest field somehow...

Is this a known bug ? Do I need a particular version of Nutch with a 
particular Solr or something ?

--
*Tom Chiverton*
Lead Developer
e:  t...@extravision.com 
p:  0161 817 2922
t:  @extravision 
w:  www.extravision.com 

Extravision - email worth seeing 
Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, 
Manchester, M15 4LD.

Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19

This e-mail is intended solely for the person to whom it is addressed 
and may contain confidential or privileged information.
Any views or opinions presented in this e-mail are solely of the author 
and do not necessarily represent those of Extravision Ltd.