Great, that works!
On Thu, May 6, 2021 at 2:47 PM Tim Allison <[email protected]> wrote:

> We’ve fixed writelimit in rmeta for 1.27, but not much else has changed.
> Unless you need that to work, you should be good to go w 1.26.
>

I assume writelimit is an optional parameter so it should not matter.

>

> Is this of any use in your choice of endpoint?
>
> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared
>

Yes, this was very useful. I was indeed expecting rmeta to do what I need,
but it did not return anything when using Accept: text/plain, it needed
JSON. I should have probably read the manual better ;) In any case, for
people like me, one suggestion for the table is to clarify that for rmeta
in the table header.

Thanks again,
Cristi


>
>
>
>
> On Thu, May 6, 2021 at 8:30 AM Cristian Zamfir <[email protected]>
> wrote:
>
>> You're right, in version 1.27 `curl  -H "Accept: application/json"
>> http://localhost:9998/rmeta/text` <http://localhost:9998/rmeta/text>
>> does exactly what I need, thanks!
>>
>>
>>
>> On Thu, May 6, 2021 at 2:21 PM Tim Allison <[email protected]> wrote:
>>
>>> Isn’t that what /rmeta does?
>>>
>>>>
>>>> On Thu, May 6, 2021 at 8:03 AM Cristian Zamfir <[email protected]>
>>>> wrote:
>>>>
>>>>> Cool, this will work! Looking forward to release 1.27.
>>>>> It does not work for archives though, is there a way to also get
>>>>> recursively the metadata from all the files in the archive using tika/text
>>>>> accept: application/json?
>>>>> I suppose I can always wrap around the Tika library and implement this
>>>>> functionality, but I would have preferred to use the docker container
>>>>> instead.
>>>>>
>>>>> On Thu, May 6, 2021 at 1:09 PM Tim Allison <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Reply all...
>>>>>>
>>>>>> On Thu, May 6, 2021 at 7:08 AM Tim Allison <[email protected]>
>>>>>> wrote:
>>>>>> >
>>>>>> > Thank you for giving it a try!  Yes, there is overhead with parsing
>>>>>> > json, and it isn't streaming.  If you want text in the content
>>>>>> field,
>>>>>> > try /tika/text (Accept: application/json)
>>>>>> >
>>>>>> > On Thu, May 6, 2021 at 5:44 AM Cristian Zamfir <
>>>>>> [email protected]> wrote:
>>>>>> > >
>>>>>> > > Thanks! I checked version 1.27 and it does what is expected.
>>>>>> However, the extra handling of the JSON will incur some processing 
>>>>>> overhead
>>>>>> - not strictly necessary for my use case I think. Also, the content in
>>>>>> X-TIKA:content is html and I would need plain text.
>>>>>> > > What would be ideal would be an option to /tika (text|body) to
>>>>>> essentially do what /remeta provides and concatenate in the output the
>>>>>> metadata and the data. Something like `curl  -H "Accept: text/plain"  -H
>>>>>> "X-Tika-meta: recursive" http://localhost:9998/tika`
>>>>>> <http://localhost:9998/tika> ? What do you think, does it make sense?
>>>>>> > >
>>>>>> > > Thanks,
>>>>>> > > Cristi
>>>>>> > >
>>>>>> > > On Wed, May 5, 2021 at 9:29 PM Tim Allison <[email protected]>
>>>>>> wrote:
>>>>>> > >>
>>>>>> > >> All,
>>>>>> > >>   I recently added a feature matrix page to our wiki for some of
>>>>>> the
>>>>>> > >> content +/- metadata endpoints in tika-server:
>>>>>> > >>
>>>>>> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared
>>>>>> .
>>>>>> > >> Please take a look and let me know what you think.
>>>>>> > >>
>>>>>> > >>           Cheers,
>>>>>> > >>
>>>>>> > >>                       Tim
>>>>>> > >>
>>>>>> > >> On Wed, May 5, 2021 at 2:15 PM Tim Allison <[email protected]>
>>>>>> wrote:
>>>>>> > >> >
>>>>>> > >> > Here’s a recent build if you want to check it out:
>>>>>> > >> >
>>>>>> https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/128/org.apache.tika$tika-server/artifact/org.apache.tika/tika-server/1.27-20210505.171622-28/tika-server-1.27-20210505.171622-28.jar
>>>>>> > >> >
>>>>>> > >> > On Wed, May 5, 2021 at 8:05 AM Tim Allison <
>>>>>> [email protected]> wrote:
>>>>>> > >> >>
>>>>>> > >> >> My guess would be a month(ish)?  Depends on what the
>>>>>> community decides...
>>>>>> > >> >>
>>>>>> > >> >> On Wed, May 5, 2021 at 5:59 AM Cristian Zamfir <
>>>>>> [email protected]> wrote:
>>>>>> > >> >> >
>>>>>> > >> >> > Great. When is 1.27 likely to be released?
>>>>>> > >> >> >
>>>>>> > >> >> > Thanks!
>>>>>> > >> >> > Cristi
>>>>>> > >> >> >
>>>>>> > >> >> > On Wed, May 5, 2021 at 11:32 AM Tim Allison <
>>>>>> [email protected]> wrote:
>>>>>> > >> >> >>
>>>>>> > >> >> >> In 1.27, there’s an accept:application/json option for the
>>>>>> /tika endpoint that will do this.  If you can build locally or grab a 
>>>>>> build
>>>>>> from Jenkins, please give it a try before the 1.27 release.
>>>>>> > >> >> >>
>>>>>> > >> >> >>
>>>>>> > >> >> >> See also /rmeta.
>>>>>> > >> >> >>
>>>>>> > >> >> >> On Wed, May 5, 2021 at 5:20 AM Cristian Zamfir <
>>>>>> [email protected]> wrote:
>>>>>> > >> >> >>>
>>>>>> > >> >> >>> Hi!
>>>>>> > >> >> >>>
>>>>>> > >> >> >>> Is there an option to tika-server to concatenate the
>>>>>> metadata and the content in the same call to localhost:9998/tika, in 
>>>>>> order
>>>>>> to avoid a separate upload of the file just to get the metadata?
>>>>>> > >> >> >>>
>>>>>> > >> >> >>> Thanks!
>>>>>> > >> >> >>> Cristi
>>>>>>
>>>>>

Reply via email to