Tim - Doh! That's exactly what happened. In a misguided attempt at helpfulness, I was stripping whitespace from the parsed string, and there was a newline before it that was removed.
I will remove the stripping from the routine so its behavior does not modify the string returned by Tika. Thanks for your help. - Keith On Mon, Aug 14, 2023 at 10:38 PM Tim Allison <[email protected]> wrote: > Is it possible that this is due to extra whitespace in the PDF? > > On Sun, Jul 30, 2023 at 2:17 PM Keith Bennett <[email protected]> > wrote: > >> Hi, all. I am finally getting around to updating the "rika" Ruby gem for >> interacting with Tika in JRuby, and encountered something weird. When I >> test parsing a text file with max content length of 8, I get 8 characters >> ("Stopping"). When I test parsing a PDF file with max content length of 8, >> I only get 7 characters ("Stoppin"). Is this expected? >> >>
stoppin.pdf
Description: Adobe PDF document
