I've had better luck with -T

curl -T test_recursive_embedded.docx http://localhost:9998/meta

https://wiki.apache.org/tika/TikaJAXRS

On Wed, May 2, 2018 at 3:04 PM, Hanjan, Harinder <[email protected]
> wrote:

> Hello!
>
>
>
> I am sending a PDF document to Tika Server and it is being detected as a
> plain text file (see full stack trace at bottom). If I specify
> ‘Content-Type: application/pdf’ in the header of the request, then Tika is
> able to extract content. In the tests below, mydocument.pdf is simply a
> text file I printed to PDF using Google Chrome.
>
>
>
> Am I wrong in expecting that Tika determine the type of document without
> any additional help?
>
>
>
> Sent:
>
>   curl -X PUT http://localhost:9998/tika --data-binary "@mydocument.pdf"
>
>  curl -X PUT http://localhost:9998/tika -F "[email protected]"
>
> Received:
>
>   HTTP 415 Unsupported Media Type exception
>
>
>
> Sent:
>
>   curl -X PUT http://localhost:9998/tika --data-binary "@mydocument.pdf"
> -H "Content-Type: application/pdf"
>
>   curl -X PUT http://localhost:9998/meta -F "[email protected]" -H
> "Content-Type: application/pdf"
>
> Received:
>
> *  Text for the PDF*
>
>
>
>
>
> INFO  tika (application/x-www-form-urlencoded)
>
> WARN  tika: Text extraction failed
>
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.server.resource.TikaResource$1@1469bc28
>
>         at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:282)
>
>         at org.apache.tika.parser.AutoDetectParser.parse(
> AutoDetectParser.java:143)
>
>         at org.apache.tika.server.resource.TikaResource.parse(
> TikaResource.java:390)
>
>         at org.apache.tika.server.resource.TikaResource$5.write(
> TikaResource.java:489)
>
>         at org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(
> BinaryDataProvider.java:164)
>
>         at org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(
> JAXRSUtils.java:1414)
>
>         at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.
> serializeMessage(JAXRSOutInterceptor.java:243)
>
>         at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.
> processResponse(JAXRSOutInterceptor.java:119)
>
>         at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.
> handleMessage(JAXRSOutInterceptor.java:82)
>
>         at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(
> PhaseInterceptorChain.java:307)
>
>         at org.apache.cxf.interceptor.OutgoingChainInterceptor.
> handleMessage(OutgoingChainInterceptor.java:83)
>
>         at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(
> PhaseInterceptorChain.java:307)
>
>         at org.apache.cxf.transport.ChainInitiationObserver.onMessage(
> ChainInitiationObserver.java:121)
>
>         at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(
> AbstractHTTPDestination.java:274)
>
>         at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.
> doService(JettyHTTPDestination.java:261)
>
>         at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(
> JettyHTTPHandler.java:76)
>
>         at org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1088)
>
>         at org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1024)
>
>         at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:135)
>
>         at org.eclipse.jetty.server.handler.ContextHandlerCollection.
> handle(ContextHandlerCollection.java:255)
>
>         at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:116)
>
>         at org.eclipse.jetty.server.Server.handle(Server.java:370)
>
>         at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(
> AbstractHttpConnection.java:494)
>
>         at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(
> AbstractHttpConnection.java:973)
>
>         at org.eclipse.jetty.server.AbstractHttpConnection$
> RequestHandler.headerComplete(AbstractHttpConnection.java:1035)
>
>         at org.eclipse.jetty.http.HttpParser.parseNext(
> HttpParser.java:647)
>
>         at org.eclipse.jetty.http.HttpParser.parseAvailable(
> HttpParser.java:231)
>
>         at org.eclipse.jetty.server.AsyncHttpConnection.handle(
> AsyncHttpConnection.java:82)
>
>         at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(
> SelectChannelEndPoint.java:696)
>
>         at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(
> SelectChannelEndPoint.java:53)
>
>         at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:608)
>
>         at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:543)
>
>         at java.lang.Thread.run(Unknown Source)
>
> Caused by: javax.ws.rs.WebApplicationException: HTTP 415 Unsupported
> Media Type
>
>         at org.apache.tika.server.resource.TikaResource$1.parse(
> TikaResource.java:125)
>
>         at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)
>
>         ... 32 more
>
> ERROR Problem with writing the data, class 
> org.apache.tika.server.resource.TikaResource$5,
> *ContentType: text/plain*
>
>
>
>
>
> Thanks!
>
> Harinder
>
> ------------------------------
> NOTICE -
> This communication is intended ONLY for the use of the person or entity
> named above and may contain information that is confidential or legally
> privileged. If you are not the intended recipient named above or a person
> responsible for delivering messages or communications to the intended
> recipient, YOU ARE HEREBY NOTIFIED that any use, distribution, or copying
> of this communication or any of the information contained in it is strictly
> prohibited. If you have received this communication in error, please notify
> us immediately by telephone and then destroy or delete this communication,
> or return it to us by mail if requested by us. The City of Calgary thanks
> you for your attention and co-operation.
>

Reply via email to