OK, I obtained GeoSPARQL.pdf file from here: http://www.w3.org/2011/02/GeoSPARQL.pdf
I first tried the following command line: *curl -T GeoSPARQL.pdf http://localhost:9998/tika <http://localhost:9998/tika> --header "Content-type: application/pdf"* I got nothing back from the above curl command, and the server dumped the following on screen, part of a longer trace: *Caused by: java.io.IOException: Push back buffer is full* Did research, and tried starting tika-server as follows to increase the property in question to 1 GB: *java -Dorg.apache.pdfbox.baseParser.pushBackSize=1073741824 -jar tika-server-1.6.jar* I still got nothing back from the curl command, but the server did not produce a stack trace, instead just the following output: *Dec 25, 2014 9:40:33 AM org.apache.tika.server.TikaResource logRequestINFO: tika (application/pdf)* Have a feeling maybe I am missing something rudimentary. I am running tika-server on an AWS Ubuntu instance, and issueing the curl commands from a Windows 7 system. I downloaded and built Tika 1.6 from apache.org/dist/tika, with timestamp 2014-09-05 05:42. Thanks so much, happy holidays. On Thu, Dec 25, 2014 at 8:02 AM, Nick Burch <[email protected]> wrote: > On Wed, 24 Dec 2014, A.M. Sabuncu wrote: > >> I am following the examples at http://wiki.apache.org/tika/TikaJAXRS and >> using the following curl command to test text extraction from PDF files: >> >> curl -X PUT -d @GeoSPARQL.pdf http://localhost:9998/tika --header >> "Content-type: application/pdf" >> > > What happens if you try > > curl -T GeoSPARQL.pdf http://localhost:9998/tika --header "Content-type: > application/pdf" > > ? THat works fine for me for a test pdf > > Nick >
