Cool, this will work! Looking forward to release 1.27.
It does not work for archives though, is there a way to also get
recursively the metadata from all the files in the archive using tika/text
accept: application/json?
I suppose I can always wrap around the Tika library and implement this
functionality, but I would have preferred to use the docker container
instead.

On Thu, May 6, 2021 at 1:09 PM Tim Allison <[email protected]> wrote:

> Reply all...
>
> On Thu, May 6, 2021 at 7:08 AM Tim Allison <[email protected]> wrote:
> >
> > Thank you for giving it a try!  Yes, there is overhead with parsing
> > json, and it isn't streaming.  If you want text in the content field,
> > try /tika/text (Accept: application/json)
> >
> > On Thu, May 6, 2021 at 5:44 AM Cristian Zamfir <[email protected]>
> wrote:
> > >
> > > Thanks! I checked version 1.27 and it does what is expected. However,
> the extra handling of the JSON will incur some processing overhead - not
> strictly necessary for my use case I think. Also, the content in
> X-TIKA:content is html and I would need plain text.
> > > What would be ideal would be an option to /tika (text|body) to
> essentially do what /remeta provides and concatenate in the output the
> metadata and the data. Something like `curl  -H "Accept: text/plain"  -H
> "X-Tika-meta: recursive" http://localhost:9998/tika`
> <http://localhost:9998/tika> ? What do you think, does it make sense?
> > >
> > > Thanks,
> > > Cristi
> > >
> > > On Wed, May 5, 2021 at 9:29 PM Tim Allison <[email protected]>
> wrote:
> > >>
> > >> All,
> > >>   I recently added a feature matrix page to our wiki for some of the
> > >> content +/- metadata endpoints in tika-server:
> > >>
> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared
> .
> > >> Please take a look and let me know what you think.
> > >>
> > >>           Cheers,
> > >>
> > >>                       Tim
> > >>
> > >> On Wed, May 5, 2021 at 2:15 PM Tim Allison <[email protected]>
> wrote:
> > >> >
> > >> > Here’s a recent build if you want to check it out:
> > >> >
> https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/128/org.apache.tika$tika-server/artifact/org.apache.tika/tika-server/1.27-20210505.171622-28/tika-server-1.27-20210505.171622-28.jar
> > >> >
> > >> > On Wed, May 5, 2021 at 8:05 AM Tim Allison <[email protected]>
> wrote:
> > >> >>
> > >> >> My guess would be a month(ish)?  Depends on what the community
> decides...
> > >> >>
> > >> >> On Wed, May 5, 2021 at 5:59 AM Cristian Zamfir <
> [email protected]> wrote:
> > >> >> >
> > >> >> > Great. When is 1.27 likely to be released?
> > >> >> >
> > >> >> > Thanks!
> > >> >> > Cristi
> > >> >> >
> > >> >> > On Wed, May 5, 2021 at 11:32 AM Tim Allison <[email protected]>
> wrote:
> > >> >> >>
> > >> >> >> In 1.27, there’s an accept:application/json option for the
> /tika endpoint that will do this.  If you can build locally or grab a build
> from Jenkins, please give it a try before the 1.27 release.
> > >> >> >>
> > >> >> >>
> > >> >> >> See also /rmeta.
> > >> >> >>
> > >> >> >> On Wed, May 5, 2021 at 5:20 AM Cristian Zamfir <
> [email protected]> wrote:
> > >> >> >>>
> > >> >> >>> Hi!
> > >> >> >>>
> > >> >> >>> Is there an option to tika-server to concatenate the metadata
> and the content in the same call to localhost:9998/tika, in order to avoid
> a separate upload of the file just to get the metadata?
> > >> >> >>>
> > >> >> >>> Thanks!
> > >> >> >>> Cristi
>

Reply via email to