Isn’t that what /rmeta does?
> > On Thu, May 6, 2021 at 8:03 AM Cristian Zamfir <[email protected]> > wrote: > >> Cool, this will work! Looking forward to release 1.27. >> It does not work for archives though, is there a way to also get >> recursively the metadata from all the files in the archive using tika/text >> accept: application/json? >> I suppose I can always wrap around the Tika library and implement this >> functionality, but I would have preferred to use the docker container >> instead. >> >> On Thu, May 6, 2021 at 1:09 PM Tim Allison <[email protected]> wrote: >> >>> Reply all... >>> >>> On Thu, May 6, 2021 at 7:08 AM Tim Allison <[email protected]> wrote: >>> > >>> > Thank you for giving it a try! Yes, there is overhead with parsing >>> > json, and it isn't streaming. If you want text in the content field, >>> > try /tika/text (Accept: application/json) >>> > >>> > On Thu, May 6, 2021 at 5:44 AM Cristian Zamfir <[email protected]> >>> wrote: >>> > > >>> > > Thanks! I checked version 1.27 and it does what is expected. >>> However, the extra handling of the JSON will incur some processing overhead >>> - not strictly necessary for my use case I think. Also, the content in >>> X-TIKA:content is html and I would need plain text. >>> > > What would be ideal would be an option to /tika (text|body) to >>> essentially do what /remeta provides and concatenate in the output the >>> metadata and the data. Something like `curl -H "Accept: text/plain" -H >>> "X-Tika-meta: recursive" http://localhost:9998/tika` >>> <http://localhost:9998/tika> ? What do you think, does it make sense? >>> > > >>> > > Thanks, >>> > > Cristi >>> > > >>> > > On Wed, May 5, 2021 at 9:29 PM Tim Allison <[email protected]> >>> wrote: >>> > >> >>> > >> All, >>> > >> I recently added a feature matrix page to our wiki for some of the >>> > >> content +/- metadata endpoints in tika-server: >>> > >> >>> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared >>> . >>> > >> Please take a look and let me know what you think. >>> > >> >>> > >> Cheers, >>> > >> >>> > >> Tim >>> > >> >>> > >> On Wed, May 5, 2021 at 2:15 PM Tim Allison <[email protected]> >>> wrote: >>> > >> > >>> > >> > Here’s a recent build if you want to check it out: >>> > >> > >>> https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/128/org.apache.tika$tika-server/artifact/org.apache.tika/tika-server/1.27-20210505.171622-28/tika-server-1.27-20210505.171622-28.jar >>> > >> > >>> > >> > On Wed, May 5, 2021 at 8:05 AM Tim Allison <[email protected]> >>> wrote: >>> > >> >> >>> > >> >> My guess would be a month(ish)? Depends on what the community >>> decides... >>> > >> >> >>> > >> >> On Wed, May 5, 2021 at 5:59 AM Cristian Zamfir < >>> [email protected]> wrote: >>> > >> >> > >>> > >> >> > Great. When is 1.27 likely to be released? >>> > >> >> > >>> > >> >> > Thanks! >>> > >> >> > Cristi >>> > >> >> > >>> > >> >> > On Wed, May 5, 2021 at 11:32 AM Tim Allison < >>> [email protected]> wrote: >>> > >> >> >> >>> > >> >> >> In 1.27, there’s an accept:application/json option for the >>> /tika endpoint that will do this. If you can build locally or grab a build >>> from Jenkins, please give it a try before the 1.27 release. >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> See also /rmeta. >>> > >> >> >> >>> > >> >> >> On Wed, May 5, 2021 at 5:20 AM Cristian Zamfir < >>> [email protected]> wrote: >>> > >> >> >>> >>> > >> >> >>> Hi! >>> > >> >> >>> >>> > >> >> >>> Is there an option to tika-server to concatenate the >>> metadata and the content in the same call to localhost:9998/tika, in order >>> to avoid a separate upload of the file just to get the metadata? >>> > >> >> >>> >>> > >> >> >>> Thanks! >>> > >> >> >>> Cristi >>> >>
