Nick,

Try escaping your \n’s, see if that helps.

(?s)(.*\\n\\n${boundary}\\nContent-Type: text\/plain; 
charset="UTF-8"\\n\\n)(.*?)(\\n\\n${boundary}.*)

From: Nick Carenza [mailto:[email protected]]
Sent: Thursday, May 18, 2017 11:27 AM
To: [email protected]
Subject: [EXT] Parsing Email Attachments

Hey Nifi-ers,

I haven't been having any luck trying to parse email after consuming them with 
pop3.

I am composing a simple message with gmail with just plain text and it comes 
out like this (with many headers removed):

Delivered-To: [email protected]<mailto:[email protected]>
Return-Path: <[email protected]<mailto:[email protected]>>
MIME-Version: 1.0
Received: by 0.0.0.0 with HTTP; Tue, 16 May 2017 17:54:04 -0700 (PDT)
From: User <[email protected]<mailto:[email protected]>>
Date: Tue, 16 May 2017 17:54:04 -0700
Subject: test subject
To: [email protected]<mailto:[email protected]>
Content-Type: multipart/alternative; boundary="f403045f83d499711a054fadb980"

--f403045f83d499711a054fadb980
Content-Type: text/plain; charset="UTF-8"

test email body

--f403045f83d499711a054fadb980
Content-Type: text/html; charset="UTF-8"

<div dir="ltr">test email body</div>

--f403045f83d499711a054fadb980--

I just want the email body and ExtractEmailAttachments doesn't seem to extract 
the parts between the boundaries like I hoped it would.

So instead I use ExtractEmailHeaders and additionally extract the Content-Type 
header which I then retrieve just the boundary value with an UpdateAttribute 
processor configure like:

boundary: 
${email.headers.content-type:substringAfter('boundary="'):substringBefore('"'):prepend('--')}

Then I wrote a sweet regex for ReplaceText to clean this up:

(?s)(.*\n\n${boundary}\nContent-Type: text\/plain; 
charset="UTF-8"\n\n)(.*?)(\n\n${boundary}.*)

[Inline image 1]

... but even though this works in regex testers and sublimetext, it seems to 
have no effect in my flow.

Anyone have any insight on this?

Thanks,
Nick

Reply via email to