Thats great to hear Tim, thank you!. Will definitely provide feedback. While this get into 3.0 officially is there something I can prototype with /rmeta to help me get my other stuff working - any suggestions on approach or a draft PR for the official feature would be very helpful
On Tue, Mar 12, 2024 at 5:53 AM Tim Allison <[email protected]> wrote: > Stay tuned! Coming soon: https://issues.apache.org/jira/browse/TIKA-4207 > > I think I'll be wiring this into the /pipes and /async endpoints. The json > request will specify that you want bytes AND text+metadata. > > There will be two options: > a) you specify two emitters: one for json and one for raw bytes > b) you specify one emitter, and the json and raw bytes are packaged in a > zip > > I'd really appreciate feedback on the design of this feature and any help > finding bugs! > > Best, > > Tim > > Cheers, > > Tim > > On Tue, Mar 12, 2024 at 3:17 AM Zig Zag <[email protected]> wrote: > >> Hi All, >> >> I am trying to build a pipeline that needs to process content recursively >> and store the binary bytes of all embedded children in addition to their >> text and other metadata. >> >> I was looking at two options: >> >> 1. using Tika's /rmeta API and having my code just call it synchronously >> - is there a way for me to get bytes for embedded children when doing this >> ? basically some way to smoosh together what /unpack/all does into /rmeta. >> - if it's not built-in any guidance on extending my own recursive >> handler to do this ?. i'd like to keep tika-server as is and just configure >> this extension so I can keep up with updates. >> >> 2. using /async or /pipes - with this I had 2 questions: >> - Is there emitter configuration to commit both bytes and text for all >> children ? >> - is there a way for me to pass in input with my HTTP request, and use a >> emitter only for storage (basically some sort of fetcher that uses the >> input request stream - this will help me avoid one external request). >> >> Thank you for any help!, >> Samuel >> >
