Re: setMaxContentLength Behavior Differs Across Parsers?

Keith Bennett Wed, 16 Aug 2023 05:33:53 -0700

Tim -

Doh! That's exactly what happened. In a misguided attempt at helpfulness, I
was stripping whitespace from the parsed string, and there was a newline
before it that was removed.


I will remove the stripping from the routine so its behavior does not
modify the string returned by Tika.

Thanks for your help.

- Keith





On Mon, Aug 14, 2023 at 10:38 PM Tim Allison <[email protected]> wrote:

> Is it possible that this is due to extra whitespace in the PDF?
>
> On Sun, Jul 30, 2023 at 2:17 PM Keith Bennett <[email protected]>
> wrote:
>
>> Hi, all. I am finally getting around to updating the "rika" Ruby gem for
>> interacting with Tika in JRuby, and encountered something weird. When I
>> test parsing a text file with max content length of 8, I get 8 characters
>> ("Stopping"). When I test parsing a PDF file with max content length of 8,
>> I only get 7 characters ("Stoppin"). Is this expected?
>>
>>

stoppin.pdf
Description: Adobe PDF document

Re: setMaxContentLength Behavior Differs Across Parsers?

Reply via email to