I've had better luck with -T curl -T test_recursive_embedded.docx http://localhost:9998/meta
https://wiki.apache.org/tika/TikaJAXRS On Wed, May 2, 2018 at 3:04 PM, Hanjan, Harinder <[email protected] > wrote: > Hello! > > > > I am sending a PDF document to Tika Server and it is being detected as a > plain text file (see full stack trace at bottom). If I specify > ‘Content-Type: application/pdf’ in the header of the request, then Tika is > able to extract content. In the tests below, mydocument.pdf is simply a > text file I printed to PDF using Google Chrome. > > > > Am I wrong in expecting that Tika determine the type of document without > any additional help? > > > > Sent: > > curl -X PUT http://localhost:9998/tika --data-binary "@mydocument.pdf" > > curl -X PUT http://localhost:9998/tika -F "[email protected]" > > Received: > > HTTP 415 Unsupported Media Type exception > > > > Sent: > > curl -X PUT http://localhost:9998/tika --data-binary "@mydocument.pdf" > -H "Content-Type: application/pdf" > > curl -X PUT http://localhost:9998/meta -F "[email protected]" -H > "Content-Type: application/pdf" > > Received: > > * Text for the PDF* > > > > > > INFO tika (application/x-www-form-urlencoded) > > WARN tika: Text extraction failed > > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.server.resource.TikaResource$1@1469bc28 > > at org.apache.tika.parser.CompositeParser.parse( > CompositeParser.java:282) > > at org.apache.tika.parser.AutoDetectParser.parse( > AutoDetectParser.java:143) > > at org.apache.tika.server.resource.TikaResource.parse( > TikaResource.java:390) > > at org.apache.tika.server.resource.TikaResource$5.write( > TikaResource.java:489) > > at org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo( > BinaryDataProvider.java:164) > > at org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody( > JAXRSUtils.java:1414) > > at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor. > serializeMessage(JAXRSOutInterceptor.java:243) > > at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor. > processResponse(JAXRSOutInterceptor.java:119) > > at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor. > handleMessage(JAXRSOutInterceptor.java:82) > > at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept( > PhaseInterceptorChain.java:307) > > at org.apache.cxf.interceptor.OutgoingChainInterceptor. > handleMessage(OutgoingChainInterceptor.java:83) > > at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept( > PhaseInterceptorChain.java:307) > > at org.apache.cxf.transport.ChainInitiationObserver.onMessage( > ChainInitiationObserver.java:121) > > at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke( > AbstractHTTPDestination.java:274) > > at org.apache.cxf.transport.http_jetty.JettyHTTPDestination. > doService(JettyHTTPDestination.java:261) > > at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle( > JettyHTTPHandler.java:76) > > at org.eclipse.jetty.server.handler.ContextHandler. > doHandle(ContextHandler.java:1088) > > at org.eclipse.jetty.server.handler.ContextHandler. > doScope(ContextHandler.java:1024) > > at org.eclipse.jetty.server.handler.ScopedHandler.handle( > ScopedHandler.java:135) > > at org.eclipse.jetty.server.handler.ContextHandlerCollection. > handle(ContextHandlerCollection.java:255) > > at org.eclipse.jetty.server.handler.HandlerWrapper.handle( > HandlerWrapper.java:116) > > at org.eclipse.jetty.server.Server.handle(Server.java:370) > > at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest( > AbstractHttpConnection.java:494) > > at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete( > AbstractHttpConnection.java:973) > > at org.eclipse.jetty.server.AbstractHttpConnection$ > RequestHandler.headerComplete(AbstractHttpConnection.java:1035) > > at org.eclipse.jetty.http.HttpParser.parseNext( > HttpParser.java:647) > > at org.eclipse.jetty.http.HttpParser.parseAvailable( > HttpParser.java:231) > > at org.eclipse.jetty.server.AsyncHttpConnection.handle( > AsyncHttpConnection.java:82) > > at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle( > SelectChannelEndPoint.java:696) > > at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run( > SelectChannelEndPoint.java:53) > > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( > QueuedThreadPool.java:608) > > at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run( > QueuedThreadPool.java:543) > > at java.lang.Thread.run(Unknown Source) > > Caused by: javax.ws.rs.WebApplicationException: HTTP 415 Unsupported > Media Type > > at org.apache.tika.server.resource.TikaResource$1.parse( > TikaResource.java:125) > > at org.apache.tika.parser.CompositeParser.parse( > CompositeParser.java:280) > > ... 32 more > > ERROR Problem with writing the data, class > org.apache.tika.server.resource.TikaResource$5, > *ContentType: text/plain* > > > > > > Thanks! > > Harinder > > ------------------------------ > NOTICE - > This communication is intended ONLY for the use of the person or entity > named above and may contain information that is confidential or legally > privileged. If you are not the intended recipient named above or a person > responsible for delivering messages or communications to the intended > recipient, YOU ARE HEREBY NOTIFIED that any use, distribution, or copying > of this communication or any of the information contained in it is strictly > prohibited. If you have received this communication in error, please notify > us immediately by telephone and then destroy or delete this communication, > or return it to us by mail if requested by us. The City of Calgary thanks > you for your attention and co-operation. >
