Can you share a triggering file and the exact output you'd expect/want? Thank you.
On Fri, Dec 12, 2025 at 5:58 AM Sunny Thadhani <[email protected]> wrote: > Hello Tika Team, > > We have a requirement of parsing files in Tika for text extraction. > > When parsing html file (Ex. eml_parser.html), it doesn't properly parse > and use \n instead of <br> tags for line breaks. Also, it gives way more > new line characters (\n) then the source file. > > These new line characters(\n) are ignored in html iframe and it renders > text in the same line which doesn't look good. > > We are using Tika version 3.2.2 and I have also attached Tika Code, input > file(eml_parser.html) and output html(tika_processor.html) > > How can we handle this in Tika ? > > > Best Regards, > sthadhani > > This e-mail and its attachments contain confidential information from > oppscience, which is intended only for the person or entity whose address > is listed above. Any use of the information contained herein in any way > (including, but not limited to, total or partial disclosure, reproduction, > or dissemination) by persons other than the intended recipient(s) is > prohibited. If you received this e-mail in error, please notify the sender > by phone or email immediately and delete it. >
