Re: parsing html

Jeremy Dyer Tue, 11 Apr 2017 10:14:19 -0700

Sven,

There is also the GetHTML processor I added awhile back. If the input is valid 
HTML you should always be able to use a CSS selector to extract that HTML 
value. If you can provide a sample of the HTML I would be glad to make a flow 
for you doing so as an example


Jeremy

Sent from my iPhone

> On Apr 11, 2017, at 1:01 PM, Andy LoPresto <[email protected]> wrote:
> 
> Sven,
> 
> Currently I would recommend using ExecuteScript and simply streaming & 
> slicing the content bytes at line 10 (a one-line operation in Groovy, I 
> believe the same in Ruby and Python). 
> 
> This isn’t the first time I’ve heard of a similar request though, so I think 
> if you were to open a Jira requesting a “GetLine(s)” or “SliceText” 
> processor, it could be valuable to the community. The current component 
> solution would probably involve SplitText/SplitContent and as you said, 
> decent overhead, especially if the desired content is early in the flowfile. 
> 
> Andy LoPresto
> [email protected]
> [email protected]
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Apr 11, 2017, at 9:38 AM, Sven Davison <[email protected]> wrote:
>> 
>> I'm looking to parse some HTML. It's not the cleanest but i know that my 
>> content is always on line 10 of the file. I could use splittext then compare 
>> it to ensure it starts with XYZBeginningString, i supose.. but i'm looking 
>> for something w/ less overhead. Especially knowing the content is always on 
>> line 10.
>> 
>> Anyone have other/cleaner ideas on how to get the content of line 10?
>

Re: parsing html

Reply via email to