thank you that is what I have done.
On Wed, Aug 10, 2011 at 4:06 PM, Markus Jelsma <[email protected]> wrote: > Hmmm, maybe we should just strip the codepoints on all fields. We're already > doing it on content which is by far the largest field, all other fields are > tiny compared to this one. If we do it on all String fields then this would > also fix unknown fields added by custom plugins. > > The part your refer to is for the solr field mapping code. Strip codepoints > before the mapping code or you'll end up with one stripped and one not if you > use copyFields in here. > > > On Wednesday 10 August 2011 14:32:28 Cam Bazz wrote: >> Hello, >> >> From SolrWriter.java: >> >> public void write(NutchDocument doc) throws IOException { >> >> final SolrInputDocument inputDoc = new SolrInputDocument(); >> >> for(final Entry<String, NutchField> e : doc) { >> for (final Object val : e.getValue().getValues()) { >> >> // normalise the string representation for a Date >> Object val2 = val; >> >> if (val instanceof Date){ >> val2 = DateUtil.getThreadLocalDateFormat().format(val); >> } >> >> if (e.getKey().equals("content")||e.getKey().equals("e_features")) >> { if(val!=null) { >> val2 = stripNonCharCodepoints((String)val); >> } >> } >> >> inputDoc.addField(solrMapping.mapKey(e.getKey()), val2, >> e.getValue().getWeight()); >> String sCopy = solrMapping.mapCopyKey(e.getKey()); >> if (sCopy != e.getKey()) { >> inputDoc.addField(sCopy, val); >> } >> >> } >> } >> inputDoc.setDocumentBoost(doc.getWeight()); >> inputDocs.add(inputDoc); >> if (inputDocs.size() >= commitSize) { >> try { >> LOG.info("Adding " + Integer.toString(inputDocs.size()) + " >> documents"); solr.add(inputDocs); >> } catch (final SolrServerException e) { >> throw makeIOException(e); >> } >> inputDocs.clear(); >> } >> } >> >> >> what is happening after inputDoc.addField.... ? I am getting exception >> while indexing e_features, because of UTF8 encoding error. previously >> we patched this problem because of content, and now i have another >> field called e_features, and I wanted to stripNonCharCodepoints from >> that s well, but I dont understand why we are doing the if (sCopy != >> e.getKey()) { inputDoc.addField(sCopy, val);} >> >> >> Best Regards, >> C.B. > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 >

