A coworker and I were troubleshooting a bug in the ConsumeEWS processor where Unicode characters were being read as ASCII. I figured out there was a bug in my code for ConsumeEWS and plan to fix it, but as part of the research I found that the way Unicode text in the email is outputted to the FlowFile is not easy to work with; in general the whole email body is hard to work with. If there are attachments in there and all you want is the body it's even more of a mess.
How are other users reading the email message body? Has anyone else run into the issue with Unicode characters? In my scenario, we see the auto-quotes/semicolons from Outlook's Word interface becoming '?' characters, and with my fix in place they are written to the flow file using some kind of serialization format: "Where there's NiFi there is Happiness" becomes: =E2=80=9CWhere there=E2=80=99s NiFi there is Happiness=E2=80=9D. Is there a need for a new Email processor that extracts the message body by deserializing the FlowFile and reading out the body?