Stupid me (yet again):
Should have taken a TEXT instead of (only) a STRING field for the content ;)
Another question I have though (which fits the subject even better):
In the log I see many
org.apache.solr.common.SolrException: missing content stream
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
...
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
What are possible reasons herfore?
Thx
Clemens
-----Ursprüngliche Nachricht-----
Von: Clemens Wyss DEV [mailto:[email protected]]
Gesendet: Freitag, 24. April 2015 14:01
An: [email protected]
Betreff: o.a.s.c.SolrException: missing content stream
Context: Solr/Lucene 5.1
Adding documents to Solr core/index through SolrJ
I extract pdf's using tika. The pdf-content is one of the fields of my
SolrDocuments that are transmitted to Solr using SolrJ.
As not all documents seem to be "coming through" I looked into the Solr-logs
and see the follwoing exceptions:
org.apache.solr.common.SolrException: Exception writing document id
fustusermanuals#4614 to the index; possible analysis error.
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:170)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1085)
...
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source) Caused by:
java.lang.IllegalArgumentException: Document contains at least one immense term
in field="content__s_i_suggest" (whose UTF8 encoding is longer than the max
length 32766), all of which were skipped. Please correct the analyzer to not
produce such terms. The prefix of the first immense term is: '[10, 32, 10, 32,
10, 10, 70, 82, 32, 77, 111, 100, 101, 32, 100, 39, 101, 109, 112, 108, 111,
105, 32, 10, 10, 32, 10, 10, 32, 10]...', original message: bytes can be at
most 32766 in length; got 186493
at
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:667)
at
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)
at
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)
at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:231)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:449)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1349)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:242)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
... 40 more
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException:
bytes can be at most 32766 in length; got 186493
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
at
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:154)
at
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:657)
... 47 more
How can I tell Solr/SolrJ to allow more payload?
I also see some
org.apache.solr.common.SolrException: Exception writing document id
fustusermanuals#3323 to the index; possible analysis error.
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:170)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1085)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:697)
...
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source) Caused by:
java.lang.IllegalArgumentException: Document contains at least one immense term
in field="content__s_i_suggest" (whose UTF8 encoding is longer than the max
length 32766), all of which were skipped. Please correct the analyzer to not
produce such terms. The prefix of the first immense term is: '[10, 69, 78, 32,
76, 67, 68, 32, 116, 101, 108, 101, 118, 105, 115, 105, 111, 110, 10, 95, 95,
95, 95, 95, 95, 95, 95, 95, 95, 95]...', original message: bytes can be at most
32766 in length; got 164683
at
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:667)
at
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)
at
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)
at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:231)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:449)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1349)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:242)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
... 40 more
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException:
bytes can be at most 32766 in length; got 164683
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
at
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:154)
at
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:657)
... 47 more
Which seem result from the same "limitation"
Unfortunately I must extract the pdfs in the my client
Thx
Clemens