Sorry. That was the triggering file. Got it. What exactly do you expect/want? You want <br> for new lines? Or do you want a single \n for the new line and you're getting too many?
Have you tried the ToHTMLContentHandler? On Fri, Dec 12, 2025 at 9:31 AM Tim Allison <[email protected]> wrote: > Can you share a triggering file and the exact output you'd expect/want? > Thank you. > > On Fri, Dec 12, 2025 at 5:58 AM Sunny Thadhani <[email protected]> > wrote: > >> Hello Tika Team, >> >> We have a requirement of parsing files in Tika for text extraction. >> >> When parsing html file (Ex. eml_parser.html), it doesn't properly parse >> and use \n instead of <br> tags for line breaks. Also, it gives way more >> new line characters (\n) then the source file. >> >> These new line characters(\n) are ignored in html iframe and it renders >> text in the same line which doesn't look good. >> >> We are using Tika version 3.2.2 and I have also attached Tika Code, input >> file(eml_parser.html) and output html(tika_processor.html) >> >> How can we handle this in Tika ? >> >> >> Best Regards, >> sthadhani >> >> This e-mail and its attachments contain confidential information from >> oppscience, which is intended only for the person or entity whose address >> is listed above. Any use of the information contained herein in any way >> (including, but not limited to, total or partial disclosure, reproduction, >> or dissemination) by persons other than the intended recipient(s) is >> prohibited. If you received this e-mail in error, please notify the sender >> by phone or email immediately and delete it. >> >
