[ 
https://issues.apache.org/jira/browse/SOLR-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734815#action_12734815
 ] 

solrize commented on SOLR-1283:
-------------------------------

I now have a workaround.  The documents I'm indexing don't actually have html 
in them, but the schema was set up to use HTMLStripReader anyway.  I switched 
to the standard analyzer and the problem went away, and indexing also seems to 
be running faster than before.  I do still think the issue needs fixing since 
I'm sure some people use solr to index large web pages which do need html 
stripping.  Anyway, thanks to Erik H. for advice about this.

> Mark Invalid error on indexing
> ------------------------------
>
>                 Key: SOLR-1283
>                 URL: https://issues.apache.org/jira/browse/SOLR-1283
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.3
>         Environment: Ubuntu 8.04, Sun Java 6
>            Reporter: solrize
>
> When indexing large (1 megabyte) documents I get a lot of exceptions with 
> stack traces like the below.  It happens both in the Solr 1.3 release and in 
> the July 9 1.4 nightly.  I believe this to NOT be the same issue as SOLR-42.  
> I found some further discussion on solr-user: 
> http://www.nabble.com/IOException:-Mark-invalid-while-analyzing-HTML-td17052153.html
>  
> In that discussion, Grant asked the original poster to open a Jira issue, but 
> I didn't see one so I'm opening one; please feel free to merge or close if 
> it's redundant. 
> My stack trace follows.
> Jul 15, 2009 8:36:42 AM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/update params={} status=500 QTime=3 
> Jul 15, 2009 8:36:42 AM org.apache.solr.common.SolrException log
> SEVERE: java.io.IOException: Mark invalid
>         at java.io.BufferedReader.reset(BufferedReader.java:485)
>         at 
> org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171)
>         at 
> org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:728)
>         at 
> org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:742)
>         at java.io.Reader.read(Reader.java:123)
>         at 
> org.apache.lucene.analysis.CharTokenizer.next(CharTokenizer.java:108)
>         at org.apache.lucene.analysis.StopFilter.next(StopFilter.java:178)
>         at 
> org.apache.lucene.analysis.standard.StandardFilter.next(StandardFilter.java:84)
>         at 
> org.apache.lucene.analysis.LowerCaseFilter.next(LowerCaseFilter.java:53)
>         at 
> org.apache.solr.analysis.WordDelimiterFilter.next(WordDelimiterFilter.java:347)
>         at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:159)
>         at 
> org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36)
>         at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:234)
>         at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765)
>         at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:748)
>       at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2512)
>       at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2484)
>       at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240)
>       at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
>       at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
>       at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>       at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:1292)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>       at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>       at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>       at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>       at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>       at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>       at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>       at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>       at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>       at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>       at org.mortbay.jetty.Server.handle(Server.java:285)
>       at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>       at 
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
>       at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
>       at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
>       at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>       at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
>       at 
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to