I managed to resolve this issue. Turns out that the issue was because of a faulty XML file being generated by ruby-solr gem. I had to install libxml-ruby, rsolr and I used rsolr gem instead of ruby-solr.
Also, if you face this kind of issue, the test-utf8.sh file included in exampledocs is a good file to test Solr's behavior towards UTF-8 chars. Great wok Solr team, and special thanks to Erik Hatcher. *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny> On Mon, Sep 19, 2011 at 15:54, Pranav Prakash <pra...@gmail.com> wrote: > > Just in case, someone might be intrested here is the log > > SEVERE: java.lang.RuntimeException: [was class > java.io.CharConversionException] Invalid UTF-8 middle byte 0x73 (at char > #66641, byte #65289) > at > com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) > at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) > at > com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) > at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) > at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:287) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:146) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte 0x73 > (at char #66641, byte #65289) > at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:313) > at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:204) > at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101) > at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84) > at > com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57) > at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992) > at > com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628) > at > com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) > at > com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) > at > com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) > ... 26 more > > > Also, is there a setting so I can change the level of backtrace? This would > be helpful in showing the complete stack instead of 26 more ... > > *Pranav Prakash* > > "temet nosce" > > Twitter <http://twitter.com/pranavprakash> | Blog<http://blog.myblive.com> | > Google <http://www.google.com/profiles/pranny> > > > On Mon, Sep 19, 2011 at 14:16, Pranav Prakash <pra...@gmail.com> wrote: > >> >> Hi List, >> >> I tried Solr 3.4.0 today and while indexing I got the error >> java.lang.RuntimeException: [was class java.io.CharConversionException] >> Invalid UTF-8 middle byte 0x73 (at char #66611, byte #65289) >> >> My earlier version was Solr 1.4 and this same document went into index >> successfully. Looking around, I see issue >> https://issues.apache.org/jira/browse/SOLR-2381 which seems to fix the >> issue. I thought this patch is already applied to Solr 3.4.0. Is there >> something I am missing? >> >> Is there anything else I need to mention? Logs/ My document details etc.? >> >> *Pranav Prakash* >> >> "temet nosce" >> >> Twitter <http://twitter.com/pranavprakash> | Blog<http://blog.myblive.com> | >> Google <http://www.google.com/profiles/pranny> >> > >