X-TIKA:embedded_resource_path

For example, "/embed1.zip/embed2.zip/embed2a.txt", says that there's a zip
file (embed1.zip) embedded in the main file that contains another zip file
(embed2.zip), which in turn contains a text file (embed2a.txt).

On Mon, Oct 24, 2022 at 10:13 AM Chetan Bikire <[email protected]> wrote:

> Hi Tim,
>
> Thank you for your response.
> Yes, I am using /rmeta/form endpoint and I am getting info on embedded
> files seperately but not getting information for which parent this embedded
> file is belongs to so that I can track the chain of multilevel embedded
> files.
> So do have any meta property which tells us regarding this.
>
> On Sat, Oct 22, 2022, 16:06 Tim Allison <[email protected]> wrote:
>
>> 1) If you're using the /tika endpoint, embedded files are marked up as
>> such in the xhtml output with div tags.  If you want full info on embedded
>> files, I'd strongly encourage using the /rmeta endpoint.
>>
>> 2) We don't offer content marked up with json, but we do offer a text
>> option, which can be returned in the X-Tika-Content tag in the json output.
>> See https://cwiki.apache.org/confluence/display/TIKA/TikaServer for
>> details on how to request text.
>>
>> This might also be useful:
>> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared
>>
>>
>> On Fri, Oct 21, 2022 at 11:12 PM Chetan Bikire <[email protected]>
>> wrote:
>>
>>> 1) How does Tika server maintains Parent-Child relationship between main
>>> document and it's embedded documents (i.e. Email with multiple attachment)
>>> after parsing, so is their any property or tag using which we come to know
>>> relationships?
>>>
>>> 2) After parsing any document we are getting all tags in JSON format
>>> except *X-Tika-Content* tag which is in HTML format so is their any way
>>> to get this in json format?
>>>
>>> Please Assist.
>>> Thank You
>>>
>>

Reply via email to