Thats great to hear Tim, thank you!. Will definitely provide feedback.

While this get into 3.0 officially is there something I can prototype with
/rmeta to help me get my other stuff working - any suggestions on approach
or a draft PR for the official feature would be very helpful

On Tue, Mar 12, 2024 at 5:53 AM Tim Allison <[email protected]> wrote:

> Stay tuned! Coming soon: https://issues.apache.org/jira/browse/TIKA-4207
>
> I think I'll be wiring this into the /pipes and /async endpoints. The json
> request will specify that you want bytes AND text+metadata.
>
> There will be two options:
> a) you specify two emitters: one for json and one for raw bytes
> b) you specify one emitter, and the json and raw bytes are packaged in a
> zip
>
> I'd really appreciate feedback on the design of this feature and any help
> finding bugs!
>
> Best,
>
>         Tim
>
> Cheers,
>
>         Tim
>
> On Tue, Mar 12, 2024 at 3:17 AM Zig Zag <[email protected]> wrote:
>
>> Hi All,
>>
>> I am trying to build a pipeline that needs to process content recursively
>> and store the binary bytes of all embedded children in addition to their
>> text and other metadata.
>>
>>  I was looking at two options:
>>
>> 1. using Tika's /rmeta API and having my code just call it synchronously
>> - is there a way for me to get bytes for embedded children when doing this
>> ? basically some way to smoosh together what /unpack/all does into /rmeta.
>> -   if it's not built-in any guidance on extending my own recursive
>> handler to do this ?. i'd like to keep tika-server as is and just configure
>> this extension so I can keep up with updates.
>>
>> 2. using /async or /pipes - with this I had 2 questions:
>> - Is there emitter configuration to commit both bytes and text for all
>> children ?
>> - is there a way for me to pass in input with my HTTP request, and use a
>> emitter only for storage (basically some sort of fetcher that uses the
>> input request stream - this will help me avoid one external request).
>>
>> Thank you for any help!,
>> Samuel
>>
>

Reply via email to