Nick,
Try escaping your \n’s, see if that helps.
(?s)(.*\\n\\n${boundary}\\nContent-Type: text\/plain;
charset="UTF-8"\\n\\n)(.*?)(\\n\\n${boundary}.*)
From: Nick Carenza [mailto:[email protected]]
Sent: Thursday, May 18, 2017 11:27 AM
To: [email protected]
Subject: [EXT] Parsing Email Attachments
Hey Nifi-ers,
I haven't been having any luck trying to parse email after consuming them with
pop3.
I am composing a simple message with gmail with just plain text and it comes
out like this (with many headers removed):
Delivered-To: [email protected]<mailto:[email protected]>
Return-Path: <[email protected]<mailto:[email protected]>>
MIME-Version: 1.0
Received: by 0.0.0.0 with HTTP; Tue, 16 May 2017 17:54:04 -0700 (PDT)
From: User <[email protected]<mailto:[email protected]>>
Date: Tue, 16 May 2017 17:54:04 -0700
Subject: test subject
To: [email protected]<mailto:[email protected]>
Content-Type: multipart/alternative; boundary="f403045f83d499711a054fadb980"
--f403045f83d499711a054fadb980
Content-Type: text/plain; charset="UTF-8"
test email body
--f403045f83d499711a054fadb980
Content-Type: text/html; charset="UTF-8"
<div dir="ltr">test email body</div>
--f403045f83d499711a054fadb980--
I just want the email body and ExtractEmailAttachments doesn't seem to extract
the parts between the boundaries like I hoped it would.
So instead I use ExtractEmailHeaders and additionally extract the Content-Type
header which I then retrieve just the boundary value with an UpdateAttribute
processor configure like:
boundary:
${email.headers.content-type:substringAfter('boundary="'):substringBefore('"'):prepend('--')}
Then I wrote a sweet regex for ReplaceText to clean this up:
(?s)(.*\n\n${boundary}\nContent-Type: text\/plain;
charset="UTF-8"\n\n)(.*?)(\n\n${boundary}.*)
[Inline image 1]
... but even though this works in regex testers and sublimetext, it seems to
have no effect in my flow.
Anyone have any insight on this?
Thanks,
Nick