[ 
https://issues.apache.org/jira/browse/LUCENE-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208235#comment-16208235
 ] 

nkeet shah commented on LUCENE-7538:
------------------------------------

Erick, we are not using SOLR purely as a search engine. we are using it 
slightly in a non-traditional manner, and our user case of 1GB file are very 
valid. Thanks Michael for the pointer.

> Uploading large text file to a field causes "this IndexWriter is closed" error
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-7538
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7538
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 5.5.1
>            Reporter: Steve Chen
>             Fix For: 6.4, 7.0
>
>         Attachments: LUCENE-7538.patch, LUCENE-7538.patch
>
>
> We have seen "this IndexWriter is closed" error after we tried to upload a 
> large text file to a single Solr text field. The field definition in the 
> schema.xml is:
> {noformat}
> <field name="fileContent" type="text_general" indexed="true" stored="true" 
> termVectors="true" termPositions="true" termOffsets="true"/>
> {noformat}
> After that, the IndexWriter remained closed and couldn't be recovered until 
> we reloaded the Solr core.  The text file had size of 800MB, containing only 
> numbers and English characters.
> Stack trace is shown below:
> {noformat}
> 2016-11-02 23:00:17,913 [http-nio-19082-exec-3] ERROR 
> org.apache.solr.handler.RequestHandlerBase - 
> org.apache.solr.common.SolrException: Exception writing document id 1487_0_1 
> to the index; possible analysis error.
>         at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:180)
>         at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68)
>         at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>         at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:934)
>         at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1089)
>         at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:712)
>         at 
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
>         at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:126)
>         at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:131)
>         at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:237)
>         at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2082)
>         at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:651)
>         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
>         at 
> veeva.ecm.common.interfaces.web.SolrDispatchOverride.doFilter(SolrDispatchOverride.java:43)
>         at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
>         at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>         at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
>         at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
>         at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
>         at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
>         at 
> org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
>         at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
>         at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521)
>         at 
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1096)
>         at 
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674)
>         at 
> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1500)
>         at 
> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1456)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at 
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter 
> is closed
>         at 
> org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:720)
>         at 
> org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:734)
>         at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
>         at 
> org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:282)
>         at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214)
>         at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169)
>         ... 34 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 56
>         at 
> org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:201)
>         at 
> org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:183)
>         at 
> org.apache.lucene.codecs.compressing.GrowableByteArrayDataOutput.writeString(GrowableByteArrayDataOutput.java:72)
>         at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.writeField(CompressingStoredFieldsWriter.java:292)
>         at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:382)
>         at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:321)
>         at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:234)
>         at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
>         at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1477)
>         ... 37 more
> {noformat}
> We debugged and traced down the issue.  It was an integer overflow problem 
> that was not properly handled.  The 
> GrowableByteArrayDataOutput::writeString(String string) method is shown below:
> {noformat}
> @Override
>   public void writeString(String string) throws IOException {
>     int maxLen = string.length() * UnicodeUtil.MAX_UTF8_BYTES_PER_CHAR;
>     if (maxLen <= MIN_UTF8_SIZE_TO_ENABLE_DOUBLE_PASS_ENCODING)  {
>       // string is small enough that we don't need to save memory by falling 
> back to double-pass approach
>       // this is just an optimized writeString() that re-uses scratchBytes.
>       scratchBytes = ArrayUtil.grow(scratchBytes, maxLen);
>       int len = UnicodeUtil.UTF16toUTF8(string, 0, string.length(), 
> scratchBytes);
>       writeVInt(len);
>       writeBytes(scratchBytes, len);
>     } else  {
>       // use a double pass approach to avoid allocating a large intermediate 
> buffer for string encoding
>       int numBytes = UnicodeUtil.calcUTF16toUTF8Length(string, 0, 
> string.length());
>       writeVInt(numBytes);
>       bytes = ArrayUtil.grow(bytes, length + numBytes);
>       length = UnicodeUtil.UTF16toUTF8(string, 0, string.length(), bytes, 
> length);
>     }
>   }
> {noformat}
> The 800MB text file stored in the string parameter of the method had a length 
> of 800 million, the maxLen became negative integer as the result of the 
> length times 3. The negative integer was then passed into 
> ArrayUtil.grow(scratchBytes, maxLen):
> {noformat}
> public static byte[] grow(byte[] array, int minSize) {
>     assert minSize >= 0: "size must be positive (got " + minSize + "): likely 
> integer overflow?";
>     if (array.length < minSize) {
>       byte[] newArray = new byte[oversize(minSize, 1)];
>       System.arraycopy(array, 0, newArray, 0, array.length);
>       return newArray;
>     } else
>       return array;
>   }
> {noformat}
> Assertion was disabled in production so the execution won't stop. The 
> original array was returned from the method call without increasing the size, 
> which caused an ArrayIndexOutOfBoundsException to be thrown.  The 
> ArrayIndexOutOfBoundsException was wrapped into AbortingException, and later 
> on caused the IndexWriter to be closed in IndexWriter class.
> The code should fail faster with a more-specific error for the integer 
> overflow problem, and shouldn't cause the IndexWriter to be closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to