Yes we get this if their are multiple eml files embedded i.e. eml1/eml2/eml3 But in case of message file format(.msg) not getting name of file and get exception like unsupported attachment chunk property will be ignored.
Ex- (.msg1/msg2/embed.text) Embedded_Resource_Path : "/__substg1.0.3701000D/__substg1.0.3701000D/embed.txt" And also, scenario where main file which is .eml and all embedded files are .msg (eml/msg1/msg2/msg3) then the Tika- Content of main eml file includes content of all .msg files as well ,instead treating these msg files as seperate embedded file. Please assist Thank you. On Mon, Oct 24, 2022, 20:06 Tim Allison <[email protected]> wrote: > X-TIKA:embedded_resource_path > > For example, "/embed1.zip/embed2.zip/embed2a.txt", says that there's a zip > file (embed1.zip) embedded in the main file that contains another zip file > (embed2.zip), which in turn contains a text file (embed2a.txt). > > On Mon, Oct 24, 2022 at 10:13 AM Chetan Bikire <[email protected]> > wrote: > >> Hi Tim, >> >> Thank you for your response. >> Yes, I am using /rmeta/form endpoint and I am getting info on embedded >> files seperately but not getting information for which parent this embedded >> file is belongs to so that I can track the chain of multilevel embedded >> files. >> So do have any meta property which tells us regarding this. >> >> On Sat, Oct 22, 2022, 16:06 Tim Allison <[email protected]> wrote: >> >>> 1) If you're using the /tika endpoint, embedded files are marked up as >>> such in the xhtml output with div tags. If you want full info on embedded >>> files, I'd strongly encourage using the /rmeta endpoint. >>> >>> 2) We don't offer content marked up with json, but we do offer a text >>> option, which can be returned in the X-Tika-Content tag in the json output. >>> See https://cwiki.apache.org/confluence/display/TIKA/TikaServer for >>> details on how to request text. >>> >>> This might also be useful: >>> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared >>> >>> >>> On Fri, Oct 21, 2022 at 11:12 PM Chetan Bikire <[email protected]> >>> wrote: >>> >>>> 1) How does Tika server maintains Parent-Child relationship between >>>> main document and it's embedded documents (i.e. Email with multiple >>>> attachment) after parsing, so is their any property or tag using which we >>>> come to know relationships? >>>> >>>> 2) After parsing any document we are getting all tags in JSON format >>>> except *X-Tika-Content* tag which is in HTML format so is their any >>>> way to get this in json format? >>>> >>>> Please Assist. >>>> Thank You >>>> >>>
