[ https://issues.apache.org/jira/browse/OAK-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fabrizio Fortino resolved OAK-9123. ----------------------------------- Resolution: Fixed Fixed with revision 1879243 > Error: Document contains at least one immense term > -------------------------------------------------- > > Key: OAK-9123 > URL: https://issues.apache.org/jira/browse/OAK-9123 > Project: Jackrabbit Oak > Issue Type: Bug > Components: elastic-search, indexing, search > Reporter: Fabrizio Fortino > Assignee: Fabrizio Fortino > Priority: Major > > {code:java} > 11:35:09.400 [I/O dispatcher 1] ERROR o.a.j.o.p.i.e.i.ElasticIndexWriter - > Bulk item with id /wikipedia/76/84/National Palace (Mexico) failed > org.elasticsearch.ElasticsearchException: Elasticsearch exception > [type=illegal_argument_exception, reason=Document contains at least one > immense term in field="text.keyword" (whose UTF8 encoding is longer than the > max length 32766), all of which were skipped. Please correct the analyzer to > not produce such terms. The prefix of the first immense term is: '[123, 123, > 73, 110, 102, 111, 98, 111, 120, 32, 104, 105, 115, 116, 111, 114, 105, 99, > 32, 98, 117, 105, 108, 100, 105, 110, 103, 10, 124, 110]...', original > message: bytes can be at most 32766 in length; got 33409] > at > org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496) > at > org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407) > at > org.elasticsearch.action.bulk.BulkItemResponse.fromXContent(BulkItemResponse.java:138) > at > org.elasticsearch.action.bulk.BulkResponse.fromXContent(BulkResponse.java:196) > at > org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1888) > at > org.elasticsearch.client.RestHighLevelClient.lambda$performRequestAsyncAndParseEntity$10(RestHighLevelClient.java:1676) > at > org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:1758) > at > org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:590) > at org.elasticsearch.client.RestClient$1.completed(RestClient.java:333) > at org.elasticsearch.client.RestClient$1.completed(RestClient.java:327) > at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122) > at > org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181) > at > org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:448) > at > org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:338) > at > org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) > at > org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) > at > org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) > at > org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) > at > org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) > at > org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) > at > org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) > at > org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) > at > org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) > at > org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception > [type=max_bytes_length_exceeded_exception, reason=bytes can be at most 32766 > in length; got 33409] > at > org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496) > at > org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407) > at > org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437) > ... 24 common frames omitted{code} > This happens with huge keyword fields since Lucene doesn't allow terms with > more than 32k bytes. > See > [https://discuss.elastic.co/t/error-document-contains-at-least-one-immense-term-in-field/66486] > We have decided to always create keyword fields to remove the need to specify > properties like ordered or facet. In this way every field can be sorted or > used as facet. > In this specific case the keyword field won't be needed at all but it would > be hard to decide when include it or not. To solve this we are going to use > `ignore_above=256` so huge keyword fields will be ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005)