yes i'am indexing succeflly with DIH other files ; now i try to index this
files with ExtractingRequestHandler i get this ERROR:
null:org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Error creating OOXML
extractor
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tika.exception.TikaException: Error creating
OOXML extractor
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:122)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221)
... 27 more
Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException:
Package should contain a content type part [M1.13]
at
org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:203)
at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:673)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:73)
2016-01-12 1:23 GMT+00:00 Erick Erickson <[email protected]>:
> Looks like a bad file. Do you have any success using DIH on any files?
>
> What happens if you just send that particular file throug the
> ExtractingRequestHandler?
>
> Best,
> Erick
>
> On Mon, Jan 11, 2016 at 3:51 PM, kostali hassan
> <[email protected]> wrote:
> > such files msword and pdf donsnt indexing using *dataimoprt i have this
> > error:*
> >
> > Full Import failed:java.lang.RuntimeException:
> > java.lang.RuntimeException:
> > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
> > to read content Processing Document # 2
> > at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
> > at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
> > at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
> > at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
> > Caused by: java.lang.RuntimeException:
> > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
> > to read content Processing Document # 2
> > at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
> > at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
> > at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
> > ... 3 more
> > Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> > Unable to read content Processing Document # 2
> > at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70)
> > at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:168)
> > at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
> > at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
> > at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
> > at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
> > ... 5 more
> > Caused by: org.apache.tika.exception.TikaException: Unexpected
> > RuntimeException from
> > org.apache.tika.parser.microsoft.ooxml.OOXMLParser@188120
> > at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258)
> > at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
> > at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> > at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:162)
> > ... 9 more
> > Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException:
> > Can't open the specified file:
> > 'D:\solr\solr-5.3.1\server\tmp\apache-tika-121920532070319073.tmp'
> > at
> org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:112)
> > at
> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:224)
> > at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:69)
> > at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
> > at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
> > ... 12 more
> > Caused by: java.util.zip.ZipException: invalid END header (bad central
> > directory offset)
> > at java.util.zip.ZipFile.open(Native Method)
> > at java.util.zip.ZipFile.<init>(ZipFile.java:220)
> > at java.util.zip.ZipFile.<init>(ZipFile.java:150)
> > at java.util.zip.ZipFile.<init>(ZipFile.java:164)
> > at
> org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:174)
> > at
> org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:110)
> > ... 16 more
>