Isn’t that what /rmeta does?

>
> On Thu, May 6, 2021 at 8:03 AM Cristian Zamfir <[email protected]>
> wrote:
>
>> Cool, this will work! Looking forward to release 1.27.
>> It does not work for archives though, is there a way to also get
>> recursively the metadata from all the files in the archive using tika/text
>> accept: application/json?
>> I suppose I can always wrap around the Tika library and implement this
>> functionality, but I would have preferred to use the docker container
>> instead.
>>
>> On Thu, May 6, 2021 at 1:09 PM Tim Allison <[email protected]> wrote:
>>
>>> Reply all...
>>>
>>> On Thu, May 6, 2021 at 7:08 AM Tim Allison <[email protected]> wrote:
>>> >
>>> > Thank you for giving it a try!  Yes, there is overhead with parsing
>>> > json, and it isn't streaming.  If you want text in the content field,
>>> > try /tika/text (Accept: application/json)
>>> >
>>> > On Thu, May 6, 2021 at 5:44 AM Cristian Zamfir <[email protected]>
>>> wrote:
>>> > >
>>> > > Thanks! I checked version 1.27 and it does what is expected.
>>> However, the extra handling of the JSON will incur some processing overhead
>>> - not strictly necessary for my use case I think. Also, the content in
>>> X-TIKA:content is html and I would need plain text.
>>> > > What would be ideal would be an option to /tika (text|body) to
>>> essentially do what /remeta provides and concatenate in the output the
>>> metadata and the data. Something like `curl  -H "Accept: text/plain"  -H
>>> "X-Tika-meta: recursive" http://localhost:9998/tika`
>>> <http://localhost:9998/tika> ? What do you think, does it make sense?
>>> > >
>>> > > Thanks,
>>> > > Cristi
>>> > >
>>> > > On Wed, May 5, 2021 at 9:29 PM Tim Allison <[email protected]>
>>> wrote:
>>> > >>
>>> > >> All,
>>> > >>   I recently added a feature matrix page to our wiki for some of the
>>> > >> content +/- metadata endpoints in tika-server:
>>> > >>
>>> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared
>>> .
>>> > >> Please take a look and let me know what you think.
>>> > >>
>>> > >>           Cheers,
>>> > >>
>>> > >>                       Tim
>>> > >>
>>> > >> On Wed, May 5, 2021 at 2:15 PM Tim Allison <[email protected]>
>>> wrote:
>>> > >> >
>>> > >> > Here’s a recent build if you want to check it out:
>>> > >> >
>>> https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/128/org.apache.tika$tika-server/artifact/org.apache.tika/tika-server/1.27-20210505.171622-28/tika-server-1.27-20210505.171622-28.jar
>>> > >> >
>>> > >> > On Wed, May 5, 2021 at 8:05 AM Tim Allison <[email protected]>
>>> wrote:
>>> > >> >>
>>> > >> >> My guess would be a month(ish)?  Depends on what the community
>>> decides...
>>> > >> >>
>>> > >> >> On Wed, May 5, 2021 at 5:59 AM Cristian Zamfir <
>>> [email protected]> wrote:
>>> > >> >> >
>>> > >> >> > Great. When is 1.27 likely to be released?
>>> > >> >> >
>>> > >> >> > Thanks!
>>> > >> >> > Cristi
>>> > >> >> >
>>> > >> >> > On Wed, May 5, 2021 at 11:32 AM Tim Allison <
>>> [email protected]> wrote:
>>> > >> >> >>
>>> > >> >> >> In 1.27, there’s an accept:application/json option for the
>>> /tika endpoint that will do this.  If you can build locally or grab a build
>>> from Jenkins, please give it a try before the 1.27 release.
>>> > >> >> >>
>>> > >> >> >>
>>> > >> >> >> See also /rmeta.
>>> > >> >> >>
>>> > >> >> >> On Wed, May 5, 2021 at 5:20 AM Cristian Zamfir <
>>> [email protected]> wrote:
>>> > >> >> >>>
>>> > >> >> >>> Hi!
>>> > >> >> >>>
>>> > >> >> >>> Is there an option to tika-server to concatenate the
>>> metadata and the content in the same call to localhost:9998/tika, in order
>>> to avoid a separate upload of the file just to get the metadata?
>>> > >> >> >>>
>>> > >> >> >>> Thanks!
>>> > >> >> >>> Cristi
>>>
>>

Reply via email to