Please let me know did I missing something. Any feedback here are welcome.
Thanks & Regards Chetan On Tue, Nov 8, 2022, 16:07 Chetan Bikire <[email protected]> wrote: > Yes we get this if their are multiple eml files embedded i.e. > eml1/eml2/eml3 > But in case of message file format(.msg) not getting name of file and get > exception like unsupported attachment chunk property will be ignored. > > Ex- (.msg1/msg2/embed.text) > > Embedded_Resource_Path : > "/__substg1.0.3701000D/__substg1.0.3701000D/embed.txt" > > And also, scenario where main file which is .eml and all embedded files > are .msg (eml/msg1/msg2/msg3) then the Tika- Content of main eml file > includes content of all .msg files as well ,instead treating these msg > files as seperate embedded file. > > Please assist > Thank you. > > > > On Mon, Oct 24, 2022, 20:06 Tim Allison <[email protected]> wrote: > >> X-TIKA:embedded_resource_path >> >> For example, "/embed1.zip/embed2.zip/embed2a.txt", says that there's a >> zip file (embed1.zip) embedded in the main file that contains another zip >> file (embed2.zip), which in turn contains a text file (embed2a.txt). >> >> On Mon, Oct 24, 2022 at 10:13 AM Chetan Bikire <[email protected]> >> wrote: >> >>> Hi Tim, >>> >>> Thank you for your response. >>> Yes, I am using /rmeta/form endpoint and I am getting info on embedded >>> files seperately but not getting information for which parent this embedded >>> file is belongs to so that I can track the chain of multilevel embedded >>> files. >>> So do have any meta property which tells us regarding this. >>> >>> On Sat, Oct 22, 2022, 16:06 Tim Allison <[email protected]> wrote: >>> >>>> 1) If you're using the /tika endpoint, embedded files are marked up as >>>> such in the xhtml output with div tags. If you want full info on embedded >>>> files, I'd strongly encourage using the /rmeta endpoint. >>>> >>>> 2) We don't offer content marked up with json, but we do offer a text >>>> option, which can be returned in the X-Tika-Content tag in the json output. >>>> See https://cwiki.apache.org/confluence/display/TIKA/TikaServer for >>>> details on how to request text. >>>> >>>> This might also be useful: >>>> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared >>>> >>>> >>>> On Fri, Oct 21, 2022 at 11:12 PM Chetan Bikire <[email protected]> >>>> wrote: >>>> >>>>> 1) How does Tika server maintains Parent-Child relationship between >>>>> main document and it's embedded documents (i.e. Email with multiple >>>>> attachment) after parsing, so is their any property or tag using which we >>>>> come to know relationships? >>>>> >>>>> 2) After parsing any document we are getting all tags in JSON format >>>>> except *X-Tika-Content* tag which is in HTML format so is their any >>>>> way to get this in json format? >>>>> >>>>> Please Assist. >>>>> Thank You >>>>> >>>>
