thanks for the ideas guys! For reasons beyond my control, I can't update to the newest nifi to get the GetHTML processor @ this time. Maybe some day. I'll look into the ExecuteScript or and SplitText more.
On Tue, Apr 11, 2017 at 1:14 PM, Jeremy Dyer <[email protected]> wrote: > Sven, > > There is also the GetHTML processor I added awhile back. If the input is > valid HTML you should always be able to use a CSS selector to extract that > HTML value. If you can provide a sample of the HTML I would be glad to make > a flow for you doing so as an example > > Jeremy > > Sent from my iPhone > > On Apr 11, 2017, at 1:01 PM, Andy LoPresto <[email protected]> wrote: > > Sven, > > Currently I would recommend using ExecuteScript and simply streaming & > slicing the content bytes at line 10 (a one-line operation in Groovy, I > believe the same in Ruby and Python). > > This isn’t the first time I’ve heard of a similar request though, so I > think if you were to open a Jira requesting a “GetLine(s)” or “SliceText” > processor, it could be valuable to the community. The current component > solution would probably involve SplitText/SplitContent and as you said, > decent overhead, especially if the desired content is early in the > flowfile. > > Andy LoPresto > [email protected] > *[email protected] <[email protected]>* > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On Apr 11, 2017, at 9:38 AM, Sven Davison <[email protected]> wrote: > > I'm looking to parse some HTML. It's not the cleanest but i know that my > content is always on line 10 of the file. I could use splittext then > compare it to ensure it starts with XYZBeginningString, i supose.. but i'm > looking for something w/ less overhead. Especially knowing the content is > always on line 10. > > Anyone have other/cleaner ideas on how to get the content of line 10? > > >
