Hello, We want <br> tag to be present instead of \n. Also, there are many unnecessary \n characters. It should be as per the layout of eml_parser.html
We also tried ToHTMLContentHandler, but it didn't resolve the issue. ________________________________ From: Tim Allison <[email protected]> Sent: 12 December 2025 20:04 To: [email protected] <[email protected]> Subject: Re: Tika 3.2.2 - Html file parsing doesn't process line breaks properly You don't often get email from [email protected]. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Sorry. That was the triggering file. Got it. What exactly do you expect/want? You want <br> for new lines? Or do you want a single \n for the new line and you're getting too many? Have you tried the ToHTMLContentHandler? On Fri, Dec 12, 2025 at 9:31 AM Tim Allison <[email protected]<mailto:[email protected]>> wrote: Can you share a triggering file and the exact output you'd expect/want? Thank you. On Fri, Dec 12, 2025 at 5:58 AM Sunny Thadhani <[email protected]<mailto:[email protected]>> wrote: Hello Tika Team, We have a requirement of parsing files in Tika for text extraction. When parsing html file (Ex. eml_parser.html), it doesn't properly parse and use \n instead of <br> tags for line breaks. Also, it gives way more new line characters (\n) then the source file. These new line characters(\n) are ignored in html iframe and it renders text in the same line which doesn't look good. We are using Tika version 3.2.2 and I have also attached Tika Code, input file(eml_parser.html) and output html(tika_processor.html) How can we handle this in Tika ? Best Regards, sthadhani This e-mail and its attachments contain confidential information from oppscience, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you received this e-mail in error, please notify the sender by phone or email immediately and delete it.
