[ https://issues.apache.org/jira/browse/SOLR-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cao Manh Dat updated SOLR-8495: ------------------------------- Attachment: SOLR-8495.patch Here are the initial patch for this issue, It based on the idea #1 of [~steve_rowe] This patch introduce new {{ParseLongStringFieldUpdateProcessorFactory}} which do the check {code} if (valSize > 32000) { return new LongStringField(stringVal); } {code} So we can add new type mapping to {{AddSchemaFieldsUpdateProcessorFactory}} {code} <lst name="typeMapping"> <str name="valueClass">org.apache.solr.update.processor.LongStringField</str> <str name="fieldType">lstring</str> </lst> {code} There are some problems of this approach is : - We must define the size of chunk ( in which we split large string into ) inside schema file ( for {{ChunkTokenizerFactory}} ) not inside solrconfig. - In multi-value case, what should we do for case the first value is > 32kb and the second value is < 32kb? With this patch, first value is mapping into LongStringField and second value still a String, so {{AddSchemaFieldsUpdateProcessor#mapValueClassesToFieldType}} will create a field based on {{defaultFieldType}} ( should we modify the method? ) > Schemaless mode cannot index large text fields > ---------------------------------------------- > > Key: SOLR-8495 > URL: https://issues.apache.org/jira/browse/SOLR-8495 > Project: Solr > Issue Type: Bug > Components: Data-driven Schema, Schema and Analysis > Affects Versions: 4.10.4, 5.3.1, 5.4 > Reporter: Shalin Shekhar Mangar > Labels: difficulty-easy, impact-medium > Fix For: 5.5, 6.0 > > Attachments: SOLR-8495.patch > > > The schemaless mode by default indexes all string fields into an indexed > StrField which is limited to 32KB text. Anything larger than that leads to an > exception during analysis. > {code} > Caused by: java.lang.IllegalArgumentException: Document contains at least one > immense term in field="text" (whose UTF8 encoding is longer than the max > length 32766) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org