>But in case of message file format(.msg) not getting name of file and get >exception like unsupported attachment chunk property will be ignored.
Is this an exception or a logged message? >Embedded_Resource_Path : "/__substg1.0.3701000D/__substg1.0.3701000D/embed.txt" This looks basically right to me given how embedded files are stored in msg files. Is this what you expect? If not, what do you expect? >And also, scenario where main file which is .eml and all embedded files are >.msg (eml/msg1/msg2/msg3) then the Tika- Content of main eml file includes >content of all .msg files as well ,instead treating these msg files as >seperate embedded file. This sounds bad. Any chance you can share an example, even if privately? We can't do much without an example to work with. Thank you and sorry for my delay. On Wed, Nov 9, 2022 at 8:40 AM Chetan Bikire <[email protected]> wrote: > > Please let me know did I missing something. > > Any feedback here are welcome. > > Thanks & Regards > Chetan > > > On Tue, Nov 8, 2022, 16:07 Chetan Bikire <[email protected]> wrote: >> >> Yes we get this if their are multiple eml files embedded i.e. eml1/eml2/eml3 >> But in case of message file format(.msg) not getting name of file and get >> exception like unsupported attachment chunk property will be ignored. >> >> Ex- (.msg1/msg2/embed.text) >> >> Embedded_Resource_Path : >> "/__substg1.0.3701000D/__substg1.0.3701000D/embed.txt" >> >> And also, scenario where main file which is .eml and all embedded files are >> .msg (eml/msg1/msg2/msg3) then the Tika- Content of main eml file includes >> content of all .msg files as well ,instead treating these msg files as >> seperate embedded file. >> >> Please assist >> Thank you. >> >> >> >> On Mon, Oct 24, 2022, 20:06 Tim Allison <[email protected]> wrote: >>> >>> X-TIKA:embedded_resource_path >>> >>> For example, "/embed1.zip/embed2.zip/embed2a.txt", says that there's a zip >>> file (embed1.zip) embedded in the main file that contains another zip file >>> (embed2.zip), which in turn contains a text file (embed2a.txt). >>> >>> On Mon, Oct 24, 2022 at 10:13 AM Chetan Bikire <[email protected]> wrote: >>>> >>>> Hi Tim, >>>> >>>> Thank you for your response. >>>> Yes, I am using /rmeta/form endpoint and I am getting info on embedded >>>> files seperately but not getting information for which parent this >>>> embedded file is belongs to so that I can track the chain of multilevel >>>> embedded files. >>>> So do have any meta property which tells us regarding this. >>>> >>>> On Sat, Oct 22, 2022, 16:06 Tim Allison <[email protected]> wrote: >>>>> >>>>> 1) If you're using the /tika endpoint, embedded files are marked up as >>>>> such in the xhtml output with div tags. If you want full info on >>>>> embedded files, I'd strongly encourage using the /rmeta endpoint. >>>>> >>>>> 2) We don't offer content marked up with json, but we do offer a text >>>>> option, which can be returned in the X-Tika-Content tag in the json >>>>> output. See https://cwiki.apache.org/confluence/display/TIKA/TikaServer >>>>> for details on how to request text. >>>>> >>>>> This might also be useful: >>>>> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared >>>>> >>>>> >>>>> On Fri, Oct 21, 2022 at 11:12 PM Chetan Bikire <[email protected]> >>>>> wrote: >>>>>> >>>>>> 1) How does Tika server maintains Parent-Child relationship between main >>>>>> document and it's embedded documents (i.e. Email with multiple >>>>>> attachment) after parsing, so is their any property or tag using which >>>>>> we come to know relationships? >>>>>> >>>>>> 2) After parsing any document we are getting all tags in JSON format >>>>>> except X-Tika-Content tag which is in HTML format so is their any way to >>>>>> get this in json format? >>>>>> >>>>>> Please Assist. >>>>>> Thank You
