We’ve fixed writelimit in rmeta for 1.27, but not much else has changed. Unless you need that to work, you should be good to go w 1.26.
Is this of any use in your choice of endpoint? https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared On Thu, May 6, 2021 at 8:30 AM Cristian Zamfir <[email protected]> wrote: > You're right, in version 1.27 `curl -H "Accept: application/json" > http://localhost:9998/rmeta/text` <http://localhost:9998/rmeta/text> does > exactly what I need, thanks! > > > > On Thu, May 6, 2021 at 2:21 PM Tim Allison <[email protected]> wrote: > >> Isn’t that what /rmeta does? >> >>> >>> On Thu, May 6, 2021 at 8:03 AM Cristian Zamfir <[email protected]> >>> wrote: >>> >>>> Cool, this will work! Looking forward to release 1.27. >>>> It does not work for archives though, is there a way to also get >>>> recursively the metadata from all the files in the archive using tika/text >>>> accept: application/json? >>>> I suppose I can always wrap around the Tika library and implement this >>>> functionality, but I would have preferred to use the docker container >>>> instead. >>>> >>>> On Thu, May 6, 2021 at 1:09 PM Tim Allison <[email protected]> wrote: >>>> >>>>> Reply all... >>>>> >>>>> On Thu, May 6, 2021 at 7:08 AM Tim Allison <[email protected]> >>>>> wrote: >>>>> > >>>>> > Thank you for giving it a try! Yes, there is overhead with parsing >>>>> > json, and it isn't streaming. If you want text in the content field, >>>>> > try /tika/text (Accept: application/json) >>>>> > >>>>> > On Thu, May 6, 2021 at 5:44 AM Cristian Zamfir < >>>>> [email protected]> wrote: >>>>> > > >>>>> > > Thanks! I checked version 1.27 and it does what is expected. >>>>> However, the extra handling of the JSON will incur some processing >>>>> overhead >>>>> - not strictly necessary for my use case I think. Also, the content in >>>>> X-TIKA:content is html and I would need plain text. >>>>> > > What would be ideal would be an option to /tika (text|body) to >>>>> essentially do what /remeta provides and concatenate in the output the >>>>> metadata and the data. Something like `curl -H "Accept: text/plain" -H >>>>> "X-Tika-meta: recursive" http://localhost:9998/tika` >>>>> <http://localhost:9998/tika> ? What do you think, does it make sense? >>>>> > > >>>>> > > Thanks, >>>>> > > Cristi >>>>> > > >>>>> > > On Wed, May 5, 2021 at 9:29 PM Tim Allison <[email protected]> >>>>> wrote: >>>>> > >> >>>>> > >> All, >>>>> > >> I recently added a feature matrix page to our wiki for some of >>>>> the >>>>> > >> content +/- metadata endpoints in tika-server: >>>>> > >> >>>>> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared >>>>> . >>>>> > >> Please take a look and let me know what you think. >>>>> > >> >>>>> > >> Cheers, >>>>> > >> >>>>> > >> Tim >>>>> > >> >>>>> > >> On Wed, May 5, 2021 at 2:15 PM Tim Allison <[email protected]> >>>>> wrote: >>>>> > >> > >>>>> > >> > Here’s a recent build if you want to check it out: >>>>> > >> > >>>>> https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/128/org.apache.tika$tika-server/artifact/org.apache.tika/tika-server/1.27-20210505.171622-28/tika-server-1.27-20210505.171622-28.jar >>>>> > >> > >>>>> > >> > On Wed, May 5, 2021 at 8:05 AM Tim Allison <[email protected]> >>>>> wrote: >>>>> > >> >> >>>>> > >> >> My guess would be a month(ish)? Depends on what the community >>>>> decides... >>>>> > >> >> >>>>> > >> >> On Wed, May 5, 2021 at 5:59 AM Cristian Zamfir < >>>>> [email protected]> wrote: >>>>> > >> >> > >>>>> > >> >> > Great. When is 1.27 likely to be released? >>>>> > >> >> > >>>>> > >> >> > Thanks! >>>>> > >> >> > Cristi >>>>> > >> >> > >>>>> > >> >> > On Wed, May 5, 2021 at 11:32 AM Tim Allison < >>>>> [email protected]> wrote: >>>>> > >> >> >> >>>>> > >> >> >> In 1.27, there’s an accept:application/json option for the >>>>> /tika endpoint that will do this. If you can build locally or grab a >>>>> build >>>>> from Jenkins, please give it a try before the 1.27 release. >>>>> > >> >> >> >>>>> > >> >> >> >>>>> > >> >> >> See also /rmeta. >>>>> > >> >> >> >>>>> > >> >> >> On Wed, May 5, 2021 at 5:20 AM Cristian Zamfir < >>>>> [email protected]> wrote: >>>>> > >> >> >>> >>>>> > >> >> >>> Hi! >>>>> > >> >> >>> >>>>> > >> >> >>> Is there an option to tika-server to concatenate the >>>>> metadata and the content in the same call to localhost:9998/tika, in order >>>>> to avoid a separate upload of the file just to get the metadata? >>>>> > >> >> >>> >>>>> > >> >> >>> Thanks! >>>>> > >> >> >>> Cristi >>>>> >>>>
