Sven, There is also the GetHTML processor I added awhile back. If the input is valid HTML you should always be able to use a CSS selector to extract that HTML value. If you can provide a sample of the HTML I would be glad to make a flow for you doing so as an example
Jeremy Sent from my iPhone > On Apr 11, 2017, at 1:01 PM, Andy LoPresto <[email protected]> wrote: > > Sven, > > Currently I would recommend using ExecuteScript and simply streaming & > slicing the content bytes at line 10 (a one-line operation in Groovy, I > believe the same in Ruby and Python). > > This isn’t the first time I’ve heard of a similar request though, so I think > if you were to open a Jira requesting a “GetLine(s)” or “SliceText” > processor, it could be valuable to the community. The current component > solution would probably involve SplitText/SplitContent and as you said, > decent overhead, especially if the desired content is early in the flowfile. > > Andy LoPresto > [email protected] > [email protected] > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > >> On Apr 11, 2017, at 9:38 AM, Sven Davison <[email protected]> wrote: >> >> I'm looking to parse some HTML. It's not the cleanest but i know that my >> content is always on line 10 of the file. I could use splittext then compare >> it to ensure it starts with XYZBeginningString, i supose.. but i'm looking >> for something w/ less overhead. Especially knowing the content is always on >> line 10. >> >> Anyone have other/cleaner ideas on how to get the content of line 10? >
