Hi Bernrd,
dcdescription field is not indexed.

Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 11.05.2015 15:22, Bernd Fehling wrote:
Hi Emir,

the dcdescription field is definately to big.
But why is it complaining about f_dcperson and not dcdescription?

Regards
Bernd


Am 11.05.2015 um 15:12 schrieb Emir Arnautovic:
Hi Bernd,
Issue is with f_dcperson and what ends up in that field. It is configured to be 
string, which means it is not tokenized so if some huge value is
in either dccreator or dccontributor it will end up as single term. Nemes 
suggest that it should not contain such values, but double check in
your import code if you are reading wrong column or concatenating contributors 
or something else causing value to be to big. Also check if you
have some copyField that should not be there.

Thanks,
Emir
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 11.05.2015 14:13, Bernd Fehling wrote:
I'm getting the following error with 4.10.4

WARN  org.apache.solr.handler.dataimport.SolrWriter  – Error creating document :
SolrInputDocument(fields: [dcautoclasscode=310, dclang=unknown,....
....
..., dcdocid=dd05ad427a58b49150a4ca36148187028562257a77643062382a1366250112ac])
org.apache.solr.common.SolrException: Exception writing document
id ftumdeepblue:oai:deepblue.lib.umich.edu:2027.42/79437 to the index; possible 
analysis error.
          at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
          at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
          at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
...
          at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
          at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.IllegalArgumentException: Document contains at least one 
immense term
in field="f_dcperson" (whose UTF8 encoding is longer than the max length 
32766), all of which were skipped.
Please correct the analyzer to not produce such terms.  The prefix of the first 
immense
term is: '[102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 
101, 101, 32, 66, 114,
111, 119, 110, 105, 110, 103, 32, 32, 32, 50, 48]...', original message:
bytes can be at most 32766 in length; got 38177
          at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
...


My huge field is dcdescription, with the following schema:

     <field name="dccreator" type="string" indexed="true" stored="true" 
multiValued="true" />
     <field name="dcdescription" type="string" indexed="false" stored="true" />
     <field name="f_dcperson" type="string" indexed="true" stored="true" 
multiValued="true" />
...
    <copyField source="dccreator" dest="f_dcperson" />
    <copyField source="dccontributor" dest="f_dcperson" />


I guess I have to make dcdescription also "multivalue=true"?

But why is it complaining about f_dcperson which is already multivalue?

Second guess, dcdescription is not multivalue, but filled to max (32766).
Then it is UTF8 encoded and going beyond 32766 which is larger than a single 
subfield
of a multivaled field and therefore the error?

Any really explanation on this and how to prevent it?

Regards
Bernd

Reply via email to