X-TIKA:embedded_resource_path For example, "/embed1.zip/embed2.zip/embed2a.txt", says that there's a zip file (embed1.zip) embedded in the main file that contains another zip file (embed2.zip), which in turn contains a text file (embed2a.txt).
On Mon, Oct 24, 2022 at 10:13 AM Chetan Bikire <[email protected]> wrote: > Hi Tim, > > Thank you for your response. > Yes, I am using /rmeta/form endpoint and I am getting info on embedded > files seperately but not getting information for which parent this embedded > file is belongs to so that I can track the chain of multilevel embedded > files. > So do have any meta property which tells us regarding this. > > On Sat, Oct 22, 2022, 16:06 Tim Allison <[email protected]> wrote: > >> 1) If you're using the /tika endpoint, embedded files are marked up as >> such in the xhtml output with div tags. If you want full info on embedded >> files, I'd strongly encourage using the /rmeta endpoint. >> >> 2) We don't offer content marked up with json, but we do offer a text >> option, which can be returned in the X-Tika-Content tag in the json output. >> See https://cwiki.apache.org/confluence/display/TIKA/TikaServer for >> details on how to request text. >> >> This might also be useful: >> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared >> >> >> On Fri, Oct 21, 2022 at 11:12 PM Chetan Bikire <[email protected]> >> wrote: >> >>> 1) How does Tika server maintains Parent-Child relationship between main >>> document and it's embedded documents (i.e. Email with multiple attachment) >>> after parsing, so is their any property or tag using which we come to know >>> relationships? >>> >>> 2) After parsing any document we are getting all tags in JSON format >>> except *X-Tika-Content* tag which is in HTML format so is their any way >>> to get this in json format? >>> >>> Please Assist. >>> Thank You >>> >>
