Please let me know did I missing something.

Any feedback here are welcome.

Thanks & Regards
Chetan


On Tue, Nov 8, 2022, 16:07 Chetan Bikire <[email protected]> wrote:

> Yes we get this if their are multiple eml files embedded i.e.
> eml1/eml2/eml3
> But in case of message file format(.msg) not getting name of file and get
> exception like unsupported attachment chunk property will be ignored.
>
> Ex- (.msg1/msg2/embed.text)
>
> Embedded_Resource_Path :
> "/__substg1.0.3701000D/__substg1.0.3701000D/embed.txt"
>
> And also, scenario where main file which is .eml and all embedded files
> are .msg (eml/msg1/msg2/msg3) then the Tika- Content of main eml file
> includes content of all .msg files as well ,instead treating these msg
> files as seperate embedded file.
>
> Please assist
> Thank you.
>
>
>
> On Mon, Oct 24, 2022, 20:06 Tim Allison <[email protected]> wrote:
>
>> X-TIKA:embedded_resource_path
>>
>> For example, "/embed1.zip/embed2.zip/embed2a.txt", says that there's a
>> zip file (embed1.zip) embedded in the main file that contains another zip
>> file (embed2.zip), which in turn contains a text file (embed2a.txt).
>>
>> On Mon, Oct 24, 2022 at 10:13 AM Chetan Bikire <[email protected]>
>> wrote:
>>
>>> Hi Tim,
>>>
>>> Thank you for your response.
>>> Yes, I am using /rmeta/form endpoint and I am getting info on embedded
>>> files seperately but not getting information for which parent this embedded
>>> file is belongs to so that I can track the chain of multilevel embedded
>>> files.
>>> So do have any meta property which tells us regarding this.
>>>
>>> On Sat, Oct 22, 2022, 16:06 Tim Allison <[email protected]> wrote:
>>>
>>>> 1) If you're using the /tika endpoint, embedded files are marked up as
>>>> such in the xhtml output with div tags.  If you want full info on embedded
>>>> files, I'd strongly encourage using the /rmeta endpoint.
>>>>
>>>> 2) We don't offer content marked up with json, but we do offer a text
>>>> option, which can be returned in the X-Tika-Content tag in the json output.
>>>> See https://cwiki.apache.org/confluence/display/TIKA/TikaServer for
>>>> details on how to request text.
>>>>
>>>> This might also be useful:
>>>> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared
>>>>
>>>>
>>>> On Fri, Oct 21, 2022 at 11:12 PM Chetan Bikire <[email protected]>
>>>> wrote:
>>>>
>>>>> 1) How does Tika server maintains Parent-Child relationship between
>>>>> main document and it's embedded documents (i.e. Email with multiple
>>>>> attachment) after parsing, so is their any property or tag using which we
>>>>> come to know relationships?
>>>>>
>>>>> 2) After parsing any document we are getting all tags in JSON format
>>>>> except *X-Tika-Content* tag which is in HTML format so is their any
>>>>> way to get this in json format?
>>>>>
>>>>> Please Assist.
>>>>> Thank You
>>>>>
>>>>

Reply via email to