I am following the examples at http://wiki.apache.org/tika/TikaJAXRS and using the following curl command to test text extraction from PDF files:
curl -X PUT -d @GeoSPARQL.pdf http://localhost:9998/tika --header "Content-type: application/pdf" On trivial PDF files (e.g. created using Word 2010's convert-to-pdf functionality and containing only the text "Testing", about 81 KB in size), I get errors in that there's nothing returned from the curl command, and on the tika-server end, I see the following errors: <lots of garbage characters displayed on screen, followed by> WARNING: Did not found XRef object at specified startxref position 0 Being new to Tika, I would like to know whether I am doing something wrong, or if PDF parsing is not yet an exact science. Many thanks in advance. Sabuncu
