Can you share a triggering file and the exact output you'd expect/want?
Thank you.

On Fri, Dec 12, 2025 at 5:58 AM Sunny Thadhani <[email protected]>
wrote:

> Hello Tika Team,
>
> We have a requirement of parsing files in Tika for text extraction.
>
> When parsing html file (Ex. eml_parser.html), it doesn't properly parse
> and use \n instead of <br> tags for line breaks. Also, it gives way more
> new line characters (\n) then the source file.
>
> These new line characters(\n) are ignored in html iframe and it renders
> text in the same line which doesn't look good.
>
> We are using Tika version 3.2.2 and I have also attached Tika Code, input
> file(eml_parser.html) and output html(tika_processor.html)
>
> How can we handle this in Tika ?
>
>
> Best Regards,
> sthadhani
>
> This e-mail and its attachments contain confidential information from
> oppscience, which is intended only for the person or entity whose address
> is listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you received this e-mail in error, please notify the sender
> by phone or email immediately and delete it.
>

Reply via email to