Great, that works! On Thu, May 6, 2021 at 2:47 PM Tim Allison <[email protected]> wrote:
> We’ve fixed writelimit in rmeta for 1.27, but not much else has changed. > Unless you need that to work, you should be good to go w 1.26. > I assume writelimit is an optional parameter so it should not matter. > > Is this of any use in your choice of endpoint? > > https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared > Yes, this was very useful. I was indeed expecting rmeta to do what I need, but it did not return anything when using Accept: text/plain, it needed JSON. I should have probably read the manual better ;) In any case, for people like me, one suggestion for the table is to clarify that for rmeta in the table header. Thanks again, Cristi > > > > > On Thu, May 6, 2021 at 8:30 AM Cristian Zamfir <[email protected]> > wrote: > >> You're right, in version 1.27 `curl -H "Accept: application/json" >> http://localhost:9998/rmeta/text` <http://localhost:9998/rmeta/text> >> does exactly what I need, thanks! >> >> >> >> On Thu, May 6, 2021 at 2:21 PM Tim Allison <[email protected]> wrote: >> >>> Isn’t that what /rmeta does? >>> >>>> >>>> On Thu, May 6, 2021 at 8:03 AM Cristian Zamfir <[email protected]> >>>> wrote: >>>> >>>>> Cool, this will work! Looking forward to release 1.27. >>>>> It does not work for archives though, is there a way to also get >>>>> recursively the metadata from all the files in the archive using tika/text >>>>> accept: application/json? >>>>> I suppose I can always wrap around the Tika library and implement this >>>>> functionality, but I would have preferred to use the docker container >>>>> instead. >>>>> >>>>> On Thu, May 6, 2021 at 1:09 PM Tim Allison <[email protected]> >>>>> wrote: >>>>> >>>>>> Reply all... >>>>>> >>>>>> On Thu, May 6, 2021 at 7:08 AM Tim Allison <[email protected]> >>>>>> wrote: >>>>>> > >>>>>> > Thank you for giving it a try! Yes, there is overhead with parsing >>>>>> > json, and it isn't streaming. If you want text in the content >>>>>> field, >>>>>> > try /tika/text (Accept: application/json) >>>>>> > >>>>>> > On Thu, May 6, 2021 at 5:44 AM Cristian Zamfir < >>>>>> [email protected]> wrote: >>>>>> > > >>>>>> > > Thanks! I checked version 1.27 and it does what is expected. >>>>>> However, the extra handling of the JSON will incur some processing >>>>>> overhead >>>>>> - not strictly necessary for my use case I think. Also, the content in >>>>>> X-TIKA:content is html and I would need plain text. >>>>>> > > What would be ideal would be an option to /tika (text|body) to >>>>>> essentially do what /remeta provides and concatenate in the output the >>>>>> metadata and the data. Something like `curl -H "Accept: text/plain" -H >>>>>> "X-Tika-meta: recursive" http://localhost:9998/tika` >>>>>> <http://localhost:9998/tika> ? What do you think, does it make sense? >>>>>> > > >>>>>> > > Thanks, >>>>>> > > Cristi >>>>>> > > >>>>>> > > On Wed, May 5, 2021 at 9:29 PM Tim Allison <[email protected]> >>>>>> wrote: >>>>>> > >> >>>>>> > >> All, >>>>>> > >> I recently added a feature matrix page to our wiki for some of >>>>>> the >>>>>> > >> content +/- metadata endpoints in tika-server: >>>>>> > >> >>>>>> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared >>>>>> . >>>>>> > >> Please take a look and let me know what you think. >>>>>> > >> >>>>>> > >> Cheers, >>>>>> > >> >>>>>> > >> Tim >>>>>> > >> >>>>>> > >> On Wed, May 5, 2021 at 2:15 PM Tim Allison <[email protected]> >>>>>> wrote: >>>>>> > >> > >>>>>> > >> > Here’s a recent build if you want to check it out: >>>>>> > >> > >>>>>> https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/128/org.apache.tika$tika-server/artifact/org.apache.tika/tika-server/1.27-20210505.171622-28/tika-server-1.27-20210505.171622-28.jar >>>>>> > >> > >>>>>> > >> > On Wed, May 5, 2021 at 8:05 AM Tim Allison < >>>>>> [email protected]> wrote: >>>>>> > >> >> >>>>>> > >> >> My guess would be a month(ish)? Depends on what the >>>>>> community decides... >>>>>> > >> >> >>>>>> > >> >> On Wed, May 5, 2021 at 5:59 AM Cristian Zamfir < >>>>>> [email protected]> wrote: >>>>>> > >> >> > >>>>>> > >> >> > Great. When is 1.27 likely to be released? >>>>>> > >> >> > >>>>>> > >> >> > Thanks! >>>>>> > >> >> > Cristi >>>>>> > >> >> > >>>>>> > >> >> > On Wed, May 5, 2021 at 11:32 AM Tim Allison < >>>>>> [email protected]> wrote: >>>>>> > >> >> >> >>>>>> > >> >> >> In 1.27, there’s an accept:application/json option for the >>>>>> /tika endpoint that will do this. If you can build locally or grab a >>>>>> build >>>>>> from Jenkins, please give it a try before the 1.27 release. >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> See also /rmeta. >>>>>> > >> >> >> >>>>>> > >> >> >> On Wed, May 5, 2021 at 5:20 AM Cristian Zamfir < >>>>>> [email protected]> wrote: >>>>>> > >> >> >>> >>>>>> > >> >> >>> Hi! >>>>>> > >> >> >>> >>>>>> > >> >> >>> Is there an option to tika-server to concatenate the >>>>>> metadata and the content in the same call to localhost:9998/tika, in >>>>>> order >>>>>> to avoid a separate upload of the file just to get the metadata? >>>>>> > >> >> >>> >>>>>> > >> >> >>> Thanks! >>>>>> > >> >> >>> Cristi >>>>>> >>>>>
