If you use GetHTMLElement, and set the destination property to flow file-attribute, that will store the content as an attribute flow file you can use later in expression language for example. Is that what you're after?
Simon > On 20 Jun 2016, at 19:50, Sven Davison <[email protected]> wrote: > > Awsome. It’s coming along and the overhead it might have is worth the stress > free setup. NICE! > > Now I want to take the content of it and put it into a variable so it’s > easier for me to understand how to use it. I’m TRYING to get the variable > (joke) filled with the content of the tag. I can put it out to a file just > fine, but trying to avoid a bunch of FileI/O overhead. > > > http://prntscr.com/bisfy9 > > > -Sven > > Sent from Mail for Windows 10 > > From: Simon Elliston Ball > Sent: Monday, June 20, 2016 1:52 PM > To: [email protected] > Cc: Lee Laim > Subject: Re: GetHTTP->ExtractText (Regex/User problem?) > > http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 > is something of a classic on this subject. > > I would recommend using the ExtractXPath/XQuery or GetHTMLElement > https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.GetHTMLElement/index.html > these may be a little heavier on the processing, but will certainly save you > a lot of problems with parsing. This lets you use css selectors against html, > which is more intuitive and robust to parse HTML. > > Simon > > On 20 Jun 2016, at 18:43, Sven Davison <[email protected]> wrote: > > I had tried that but got a NULL value result. Is there a setting w/in the > extractor that I need to change too? > > > > -Sven > Sent from Mail for Windows 10 > > From: Lee Laim > Sent: Monday, June 20, 2016 12:56 PM > To: [email protected] > Subject: Re: GetHTTP->ExtractText (Regex/User problem?) > > Hi Sven, > > give this a try: > > <div class=”content”>(.*?)<\/div> > > > > On Mon, Jun 20, 2016 at 10:25 AM, Sven Davison <[email protected]> wrote: > I have looked at the example for extracting text. I seen the example pulls > the content between the <title> tags. I’ve changed it to pull from the <h3> > tags w/o problem. The problem I’m having is pulling form something a bit more > specific. I’m sure the problem is with my understanding/usage of REGEX. > > I’m trying to pull the content from this example. > > <div class=”content”>this is the content I want to pull</div> > > Any help would be super awesome. I’ve been banging my head for a bit here. > > > > -Sven > > Sent from Mail for Windows 10 > >
