Awsome. It’s coming along and the overhead it might have is worth the stress 
free setup. NICE!

Now I want to take the content of it and put it into a variable so it’s easier 
for me to understand how to use it. I’m TRYING to get the variable (joke) 
filled with the content of the tag. I can put it out to a file just fine, but 
trying to avoid a bunch of FileI/O overhead.


http://prntscr.com/bisfy9


-Sven

Sent from Mail for Windows 10

From: Simon Elliston Ball
Sent: Monday, June 20, 2016 1:52 PM
To: [email protected]
Cc: Lee Laim
Subject: Re: GetHTTP->ExtractText (Regex/User problem?)

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
 is something of a classic on this subject. 

I would recommend using the ExtractXPath/XQuery or GetHTMLElement  
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.GetHTMLElement/index.html
 these may be a little heavier on the processing, but will certainly save you a 
lot of problems with parsing. This lets you use css selectors against html, 
which is more intuitive and robust to parse HTML.

Simon

On 20 Jun 2016, at 18:43, Sven Davison <[email protected]> wrote:

I had tried that but got a NULL value result.  Is there a setting w/in the 
extractor that I need to change too?
 
 
 
-Sven
Sent from Mail for Windows 10
 
From: Lee Laim
Sent: Monday, June 20, 2016 12:56 PM
To: [email protected]
Subject: Re: GetHTTP->ExtractText (Regex/User problem?)
 
Hi Sven, 
 
give this a try:
 
<div class=”content”>(.*?)<\/div>
 
 
 
On Mon, Jun 20, 2016 at 10:25 AM, Sven Davison <[email protected]> wrote:
I have looked at the example for extracting text. I seen the example pulls the 
content between the <title> tags. I’ve changed it to pull from the <h3> tags 
w/o problem. The problem I’m having is pulling form something a bit more 
specific. I’m sure the problem is with my understanding/usage of REGEX.
 
I’m trying to pull the content from this example.
 
<div class=”content”>this is the content I want to pull</div>
 
Any help would be super awesome. I’ve been banging my head for a bit here.
 
 
 
-Sven
 
Sent from Mail for Windows 10


Reply via email to