Re: parsing html

Sven Davison Thu, 13 Apr 2017 07:33:14 -0700

thanks for the ideas guys! For reasons beyond my control, I can't update to
the newest nifi to get the GetHTML processor @ this time. Maybe some day.
I'll look into the ExecuteScript or and SplitText more.


On Tue, Apr 11, 2017 at 1:14 PM, Jeremy Dyer <[email protected]> wrote:

> Sven,
>
> There is also the GetHTML processor I added awhile back. If the input is
> valid HTML you should always be able to use a CSS selector to extract that
> HTML value. If you can provide a sample of the HTML I would be glad to make
> a flow for you doing so as an example
>
> Jeremy
>
> Sent from my iPhone
>
> On Apr 11, 2017, at 1:01 PM, Andy LoPresto <[email protected]> wrote:
>
> Sven,
>
> Currently I would recommend using ExecuteScript and simply streaming &
> slicing the content bytes at line 10 (a one-line operation in Groovy, I
> believe the same in Ruby and Python).
>
> This isn’t the first time I’ve heard of a similar request though, so I
> think if you were to open a Jira requesting a “GetLine(s)” or “SliceText”
> processor, it could be valuable to the community. The current component
> solution would probably involve SplitText/SplitContent and as you said,
> decent overhead, especially if the desired content is early in the
> flowfile.
>
> Andy LoPresto
> [email protected]
> *[email protected] <[email protected]>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Apr 11, 2017, at 9:38 AM, Sven Davison <[email protected]> wrote:
>
> I'm looking to parse some HTML. It's not the cleanest but i know that my
> content is always on line 10 of the file. I could use splittext then
> compare it to ensure it starts with XYZBeginningString, i supose.. but i'm
> looking for something w/ less overhead. Especially knowing the content is
> always on line 10.
>
> Anyone have other/cleaner ideas on how to get the content of line 10?
>
>
>

Re: parsing html

Reply via email to