Hello,
>From SolrWriter.java:
public void write(NutchDocument doc) throws IOException {
final SolrInputDocument inputDoc = new SolrInputDocument();
for(final Entry<String, NutchField> e : doc) {
for (final Object val : e.getValue().getValues()) {
// normalise the string representation for a Date
Object val2 = val;
if (val instanceof Date){
val2 = DateUtil.getThreadLocalDateFormat().format(val);
}
if (e.getKey().equals("content")||e.getKey().equals("e_features")) {
if(val!=null) {
val2 = stripNonCharCodepoints((String)val);
}
}
inputDoc.addField(solrMapping.mapKey(e.getKey()), val2,
e.getValue().getWeight());
String sCopy = solrMapping.mapCopyKey(e.getKey());
if (sCopy != e.getKey()) {
inputDoc.addField(sCopy, val);
}
}
}
inputDoc.setDocumentBoost(doc.getWeight());
inputDocs.add(inputDoc);
if (inputDocs.size() >= commitSize) {
try {
LOG.info("Adding " + Integer.toString(inputDocs.size()) + " documents");
solr.add(inputDocs);
} catch (final SolrServerException e) {
throw makeIOException(e);
}
inputDocs.clear();
}
}
what is happening after inputDoc.addField.... ? I am getting exception
while indexing e_features, because of UTF8 encoding error. previously
we patched this problem because of content, and now i have another
field called e_features, and I wanted to stripNonCharCodepoints from
that s well, but I dont understand why we are doing the if (sCopy !=
e.getKey()) { inputDoc.addField(sCopy, val);}
Best Regards,
C.B.