I'm getting the following error with 4.10.4

WARN  org.apache.solr.handler.dataimport.SolrWriter  – Error creating document :
SolrInputDocument(fields: [dcautoclasscode=310, dclang=unknown,....
....
..., dcdocid=dd05ad427a58b49150a4ca36148187028562257a77643062382a1366250112ac])
org.apache.solr.common.SolrException: Exception writing document
id ftumdeepblue:oai:deepblue.lib.umich.edu:2027.42/79437 to the index; possible 
analysis error.
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
        at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
...
        at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
        at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.IllegalArgumentException: Document contains at least one 
immense term
in field="f_dcperson" (whose UTF8 encoding is longer than the max length 
32766), all of which were skipped.
Please correct the analyzer to not produce such terms.  The prefix of the first 
immense
term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 
101, 101, 32, 66, 114,
111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message:
bytes can be at most 32766 in length; got 38177
        at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
...


My huge field is dcdescription, with the following schema:

   <field name="dccreator" type="string" indexed="true" stored="true" 
multiValued="true" />
   <field name="dcdescription" type="string" indexed="false" stored="true" />
   <field name="f_dcperson" type="string" indexed="true" stored="true" 
multiValued="true" />
...
  <copyField source="dccreator" dest="f_dcperson" />
  <copyField source="dccontributor" dest="f_dcperson" />


I guess I have to make dcdescription also "multivalue=true"?

But why is it complaining about f_dcperson which is already multivalue?

Second guess, dcdescription is not multivalue, but filled to max (32766).
Then it is UTF8 encoded and going beyond 32766 which is larger than a single 
subfield
of a multivaled field and therefore the error?

Any really explanation on this and how to prevent it?

Regards
Bernd

Reply via email to