1) If you're using the /tika endpoint, embedded files are marked up as such
in the xhtml output with div tags.  If you want full info on embedded
files, I'd strongly encourage using the /rmeta endpoint.

2) We don't offer content marked up with json, but we do offer a text
option, which can be returned in the X-Tika-Content tag in the json output.
See https://cwiki.apache.org/confluence/display/TIKA/TikaServer for details
on how to request text.

This might also be useful:
https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared


On Fri, Oct 21, 2022 at 11:12 PM Chetan Bikire <[email protected]> wrote:

> 1) How does Tika server maintains Parent-Child relationship between main
> document and it's embedded documents (i.e. Email with multiple attachment)
> after parsing, so is their any property or tag using which we come to know
> relationships?
>
> 2) After parsing any document we are getting all tags in JSON format
> except *X-Tika-Content* tag which is in HTML format so is their any way
> to get this in json format?
>
> Please Assist.
> Thank You
>

Reply via email to