The following is my experience with this issue in the 2.x branch, which I
believe is related.

In order to remove a dependency on lucene, a change was made at some point:
https://issues.apache.org/jira/browse/NUTCH-930 

However, the change inadvertently messed up the tstamp field - although the
field was still a long in the schema, what was being written wasn't actually
a long as you observed.

Instead of fixing this change to store a long correctly, the schema used by
nutch was updated to use "date" instead of "long" and necessary changes made
to allow this.  This means the updated nutch expects to be reading a "date".

Therefore one option that should work (if you have the latest nutch) is to
start using "date" for the "tstamp" field as opposed to "long" in your
schema - this is what nutch is expecting.  If you can change your schema
this is what I would recommend trying.  Take a look at the schema that comes
with the latest nutch to see what I'm talking about.


Since I wasn't sure at the time if anyone else was using the existing "long"
field for Solr I didn't want to change the schema, so what I did was change
the code to use "long" instead.  Another option, but then you have to
maintain code separately.

To do this I had to change the SolrDeleteDuplicates code and also the code
that actually indexes the field has to be changed to long too from date
(BasicIndexingFilter).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/updating-from-nutch-1-2-to-nutch-1-7-with-Solr-1-4-1-dedup-crashes-tp4077567p4077612.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to