Hi, I am running Tika with curl against the REST service. When I request to extract also the inline images from PDF like this: *curl --header "X-Tika-PDFextractInlineImages:true" -T ./test.pdf http://localhost:32768/rmeta <http://localhost:32768/rmeta> >result.json*
I do obtain structure (xml/html) including image tags such as "*<img src=\"embedded:image0.jpg\" alt=\"image0.jpg\" />*" I see separate metadata in the output such as "*"X-TIKA:embedded_resource_path": "/image0.jpg"*" but nothing in there resembles binary image data in any form. And I have to restrict myself to REST only, since I will be using pyton to talk to Tika. Is there any specification to the embedded image data? Is there a way to pull/save the embedded images in actual binary image format? Example JSON output attached. Markus
result.json
Description: application/json
