Hello,

>From SolrWriter.java:

  public void write(NutchDocument doc) throws IOException {

    final SolrInputDocument inputDoc = new SolrInputDocument();

    for(final Entry<String, NutchField> e : doc) {
      for (final Object val : e.getValue().getValues()) {
        
        // normalise the string representation for a Date
        Object val2 = val;

        if (val instanceof Date){
          val2 = DateUtil.getThreadLocalDateFormat().format(val);
        }

        if (e.getKey().equals("content")||e.getKey().equals("e_features")) {
                if(val!=null) {
                        val2 = stripNonCharCodepoints((String)val);
                }
        }

        inputDoc.addField(solrMapping.mapKey(e.getKey()), val2,
e.getValue().getWeight());
        String sCopy = solrMapping.mapCopyKey(e.getKey());
        if (sCopy != e.getKey()) {
                inputDoc.addField(sCopy, val);  
        }

      }
    }
    inputDoc.setDocumentBoost(doc.getWeight());
    inputDocs.add(inputDoc);
    if (inputDocs.size() >= commitSize) {
      try {
        LOG.info("Adding " + Integer.toString(inputDocs.size()) + " documents");
        solr.add(inputDocs);
      } catch (final SolrServerException e) {
        throw makeIOException(e);
      }
      inputDocs.clear();
    }
  }


what is happening after inputDoc.addField.... ? I am getting exception
while indexing e_features, because of UTF8 encoding error. previously
we patched this problem because of content, and now i have another
field called e_features, and I wanted to stripNonCharCodepoints from
that s well, but I dont understand why we are doing the   if (sCopy !=
e.getKey()) { inputDoc.addField(sCopy, val);}


Best Regards,
C.B.

Reply via email to