Sorry. That was the triggering file. Got it.

What exactly do you expect/want? You want <br> for new lines? Or do you
want a single \n for the new line and you're getting too many?

Have you tried the ToHTMLContentHandler?

On Fri, Dec 12, 2025 at 9:31 AM Tim Allison <[email protected]> wrote:

> Can you share a triggering file and the exact output you'd expect/want?
> Thank you.
>
> On Fri, Dec 12, 2025 at 5:58 AM Sunny Thadhani <[email protected]>
> wrote:
>
>> Hello Tika Team,
>>
>> We have a requirement of parsing files in Tika for text extraction.
>>
>> When parsing html file (Ex. eml_parser.html), it doesn't properly parse
>> and use \n instead of <br> tags for line breaks. Also, it gives way more
>> new line characters (\n) then the source file.
>>
>> These new line characters(\n) are ignored in html iframe and it renders
>> text in the same line which doesn't look good.
>>
>> We are using Tika version 3.2.2 and I have also attached Tika Code, input
>> file(eml_parser.html) and output html(tika_processor.html)
>>
>> How can we handle this in Tika ?
>>
>>
>> Best Regards,
>> sthadhani
>>
>> This e-mail and its attachments contain confidential information from
>> oppscience, which is intended only for the person or entity whose address
>> is listed above. Any use of the information contained herein in any way
>> (including, but not limited to, total or partial disclosure, reproduction,
>> or dissemination) by persons other than the intended recipient(s) is
>> prohibited. If you received this e-mail in error, please notify the sender
>> by phone or email immediately and delete it.
>>
>

Reply via email to