A coworker and I were troubleshooting a bug in the ConsumeEWS processor where 
Unicode characters were being read as ASCII.
I figured out there was a bug in my code for ConsumeEWS and plan to fix it, but 
as part of the research I found that the way Unicode text in the email is 
outputted to the FlowFile is not easy to work with; in general the whole email 
body is hard to work with. If there are attachments in there and all you want 
is the body it's even more of a mess.

How are other users reading the email message body? Has anyone else run into 
the issue with Unicode characters?

In my scenario, we see the auto-quotes/semicolons from Outlook's Word interface 
becoming '?' characters, and with my fix in place they are written to the flow 
file using some kind of serialization format:

"Where there's NiFi there is Happiness" becomes:

=E2=80=9CWhere there=E2=80=99s NiFi there is Happiness=E2=80=9D.

Is there a need for a new Email processor that extracts the message body by 
deserializing the FlowFile and reading out the body?

Reply via email to