Hi Emir, the dcdescription field is definately to big. But why is it complaining about f_dcperson and not dcdescription?
Regards Bernd Am 11.05.2015 um 15:12 schrieb Emir Arnautovic: > Hi Bernd, > Issue is with f_dcperson and what ends up in that field. It is configured to > be string, which means it is not tokenized so if some huge value is > in either dccreator or dccontributor it will end up as single term. Nemes > suggest that it should not contain such values, but double check in > your import code if you are reading wrong column or concatenating > contributors or something else causing value to be to big. Also check if you > have some copyField that should not be there. > > Thanks, > Emir > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > > On 11.05.2015 14:13, Bernd Fehling wrote: >> I'm getting the following error with 4.10.4 >> >> WARN org.apache.solr.handler.dataimport.SolrWriter – Error creating >> document : >> SolrInputDocument(fields: [dcautoclasscode=310, dclang=unknown,.... >> .... >> ..., >> dcdocid=dd05ad427a58b49150a4ca36148187028562257a77643062382a1366250112ac]) >> org.apache.solr.common.SolrException: Exception writing document >> id ftumdeepblue:oai:deepblue.lib.umich.edu:2027.42/79437 to the index; >> possible analysis error. >> at >> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168) >> at >> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) >> at >> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) >> ... >> at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480) >> at >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461) >> Caused by: java.lang.IllegalArgumentException: Document contains at least >> one immense term >> in field="f_dcperson" (whose UTF8 encoding is longer than the max length >> 32766), all of which were skipped. >> Please correct the analyzer to not produce such terms. The prefix of the >> first immense >> term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, >> 101, 101, 32, 66, 114, >> 111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message: >> bytes can be at most 32766 in length; got 38177 >> at >> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687) >> ... >> >> >> My huge field is dcdescription, with the following schema: >> >> <field name="dccreator" type="string" indexed="true" stored="true" >> multiValued="true" /> >> <field name="dcdescription" type="string" indexed="false" stored="true" >> /> >> <field name="f_dcperson" type="string" indexed="true" stored="true" >> multiValued="true" /> >> ... >> <copyField source="dccreator" dest="f_dcperson" /> >> <copyField source="dccontributor" dest="f_dcperson" /> >> >> >> I guess I have to make dcdescription also "multivalue=true"? >> >> But why is it complaining about f_dcperson which is already multivalue? >> >> Second guess, dcdescription is not multivalue, but filled to max (32766). >> Then it is UTF8 encoded and going beyond 32766 which is larger than a single >> subfield >> of a multivaled field and therefore the error? >> >> Any really explanation on this and how to prevent it? >> >> Regards >> Bernd > -- ************************************************************* Bernd Fehling Bielefeld University Library Dipl.-Inform. (FH) LibTec - Library Technology Universitätsstr. 25 and Knowledge Management 33615 Bielefeld Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de BASE - Bielefeld Academic Search Engine - www.base-search.net *************************************************************