1) If you're using the /tika endpoint, embedded files are marked up as such in the xhtml output with div tags. If you want full info on embedded files, I'd strongly encourage using the /rmeta endpoint.
2) We don't offer content marked up with json, but we do offer a text option, which can be returned in the X-Tika-Content tag in the json output. See https://cwiki.apache.org/confluence/display/TIKA/TikaServer for details on how to request text. This might also be useful: https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared On Fri, Oct 21, 2022 at 11:12 PM Chetan Bikire <[email protected]> wrote: > 1) How does Tika server maintains Parent-Child relationship between main > document and it's embedded documents (i.e. Email with multiple attachment) > after parsing, so is their any property or tag using which we come to know > relationships? > > 2) After parsing any document we are getting all tags in JSON format > except *X-Tika-Content* tag which is in HTML format so is their any way > to get this in json format? > > Please Assist. > Thank You >
