I'm getting the following error with 4.10.4 WARN org.apache.solr.handler.dataimport.SolrWriter – Error creating document : SolrInputDocument(fields: [dcautoclasscode=310, dclang=unknown,.... .... ..., dcdocid=dd05ad427a58b49150a4ca36148187028562257a77643062382a1366250112ac]) org.apache.solr.common.SolrException: Exception writing document id ftumdeepblue:oai:deepblue.lib.umich.edu:2027.42/79437 to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) ... at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461) Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="f_dcperson" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 101, 101, 32, 66, 114, 111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message: bytes can be at most 32766 in length; got 38177 at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687) ...
My huge field is dcdescription, with the following schema: <field name="dccreator" type="string" indexed="true" stored="true" multiValued="true" /> <field name="dcdescription" type="string" indexed="false" stored="true" /> <field name="f_dcperson" type="string" indexed="true" stored="true" multiValued="true" /> ... <copyField source="dccreator" dest="f_dcperson" /> <copyField source="dccontributor" dest="f_dcperson" /> I guess I have to make dcdescription also "multivalue=true"? But why is it complaining about f_dcperson which is already multivalue? Second guess, dcdescription is not multivalue, but filled to max (32766). Then it is UTF8 encoded and going beyond 32766 which is larger than a single subfield of a multivaled field and therefore the error? Any really explanation on this and how to prevent it? Regards Bernd