User problem?)

Simon Elliston Ball Mon, 20 Jun 2016 15:12:07 -0700

If you use GetHTMLElement, and set the destination property to flow 
file-attribute, that will store the content as an attribute flow file you can 
use later in expression language for example. Is that what you're after?


Simon

> On 20 Jun 2016, at 19:50, Sven Davison <[email protected]> wrote:
> 
> Awsome. It’s coming along and the overhead it might have is worth the stress 
> free setup. NICE!
>  
> Now I want to take the content of it and put it into a variable so it’s 
> easier for me to understand how to use it. I’m TRYING to get the variable 
> (joke) filled with the content of the tag. I can put it out to a file just 
> fine, but trying to avoid a bunch of FileI/O overhead.
>  
>  
> http://prntscr.com/bisfy9
>  
>  
> -Sven
>  
> Sent from Mail for Windows 10
>  
> From: Simon Elliston Ball
> Sent: Monday, June 20, 2016 1:52 PM
> To: [email protected]
> Cc: Lee Laim
> Subject: Re: GetHTTP->ExtractText (Regex/User problem?)
>  
> http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
>  is something of a classic on this subject. 
>  
> I would recommend using the ExtractXPath/XQuery or GetHTMLElement  
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.GetHTMLElement/index.html
>  these may be a little heavier on the processing, but will certainly save you 
> a lot of problems with parsing. This lets you use css selectors against html, 
> which is more intuitive and robust to parse HTML.
>  
> Simon
>  
> On 20 Jun 2016, at 18:43, Sven Davison <[email protected]> wrote:
>  
> I had tried that but got a NULL value result.  Is there a setting w/in the 
> extractor that I need to change too?
>  
>  
>  
> -Sven
> Sent from Mail for Windows 10
>  
> From: Lee Laim
> Sent: Monday, June 20, 2016 12:56 PM
> To: [email protected]
> Subject: Re: GetHTTP->ExtractText (Regex/User problem?)
>  
> Hi Sven, 
>  
> give this a try:
>  
> <div class=”content”>(.*?)<\/div>
>  
>  
>  
> On Mon, Jun 20, 2016 at 10:25 AM, Sven Davison <[email protected]> wrote:
> I have looked at the example for extracting text. I seen the example pulls 
> the content between the <title> tags. I’ve changed it to pull from the <h3> 
> tags w/o problem. The problem I’m having is pulling form something a bit more 
> specific. I’m sure the problem is with my understanding/usage of REGEX.
>  
> I’m trying to pull the content from this example.
>  
> <div class=”content”>this is the content I want to pull</div>
>  
> Any help would be super awesome. I’ve been banging my head for a bit here.
>  
>  
>  
> -Sven
>  
> Sent from Mail for Windows 10
>  
>

Re: GetHTTP->ExtractText (Regex/User problem?)

Reply via email to