Hi Emir,

the dcdescription field is definately to big.
But why is it complaining about f_dcperson and not dcdescription?

Regards
Bernd


Am 11.05.2015 um 15:12 schrieb Emir Arnautovic:
> Hi Bernd,
> Issue is with f_dcperson and what ends up in that field. It is configured to 
> be string, which means it is not tokenized so if some huge value is
> in either dccreator or dccontributor it will end up as single term. Nemes 
> suggest that it should not contain such values, but double check in
> your import code if you are reading wrong column or concatenating 
> contributors or something else causing value to be to big. Also check if you
> have some copyField that should not be there.
> 
> Thanks,
> Emir
> -- 
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 
> On 11.05.2015 14:13, Bernd Fehling wrote:
>> I'm getting the following error with 4.10.4
>>
>> WARN  org.apache.solr.handler.dataimport.SolrWriter  – Error creating 
>> document :
>> SolrInputDocument(fields: [dcautoclasscode=310, dclang=unknown,....
>> ....
>> ..., 
>> dcdocid=dd05ad427a58b49150a4ca36148187028562257a77643062382a1366250112ac])
>> org.apache.solr.common.SolrException: Exception writing document
>> id ftumdeepblue:oai:deepblue.lib.umich.edu:2027.42/79437 to the index; 
>> possible analysis error.
>>          at 
>> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
>>          at 
>> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
>>          at 
>> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
>> ...
>>          at 
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
>>          at 
>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
>> Caused by: java.lang.IllegalArgumentException: Document contains at least 
>> one immense term
>> in field="f_dcperson" (whose UTF8 encoding is longer than the max length 
>> 32766), all of which were skipped.
>> Please correct the analyzer to not produce such terms.  The prefix of the 
>> first immense
>> term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 
>> 101, 101, 32, 66, 114,
>> 111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message:
>> bytes can be at most 32766 in length; got 38177
>>          at 
>> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
>> ...
>>
>>
>> My huge field is dcdescription, with the following schema:
>>
>>     <field name="dccreator" type="string" indexed="true" stored="true" 
>> multiValued="true" />
>>     <field name="dcdescription" type="string" indexed="false" stored="true" 
>> />
>>     <field name="f_dcperson" type="string" indexed="true" stored="true" 
>> multiValued="true" />
>> ...
>>    <copyField source="dccreator" dest="f_dcperson" />
>>    <copyField source="dccontributor" dest="f_dcperson" />
>>
>>
>> I guess I have to make dcdescription also "multivalue=true"?
>>
>> But why is it complaining about f_dcperson which is already multivalue?
>>
>> Second guess, dcdescription is not multivalue, but filled to max (32766).
>> Then it is UTF8 encoded and going beyond 32766 which is larger than a single 
>> subfield
>> of a multivaled field and therefore the error?
>>
>> Any really explanation on this and how to prevent it?
>>
>> Regards
>> Bernd
> 

-- 
*************************************************************
Bernd Fehling                    Bielefeld University Library
Dipl.-Inform. (FH)                LibTec - Library Technology
Universitätsstr. 25                  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060       bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

Reply via email to