Re: parsing html

Jeremy Dyer Thu, 13 Apr 2017 07:39:34 -0700

No problem Sven. Just curious which version do you have? If I recall
correctly i believe it was in as early a version as 0.5.1


On Thu, Apr 13, 2017 at 10:32 AM, Sven Davison <[email protected]>
wrote:

> thanks for the ideas guys! For reasons beyond my control, I can't update
> to the newest nifi to get the GetHTML processor @ this time. Maybe some
> day. I'll look into the ExecuteScript or and SplitText more.
>
> On Tue, Apr 11, 2017 at 1:14 PM, Jeremy Dyer <[email protected]> wrote:
>
>> Sven,
>>
>> There is also the GetHTML processor I added awhile back. If the input is
>> valid HTML you should always be able to use a CSS selector to extract that
>> HTML value. If you can provide a sample of the HTML I would be glad to make
>> a flow for you doing so as an example
>>
>> Jeremy
>>
>> Sent from my iPhone
>>
>> On Apr 11, 2017, at 1:01 PM, Andy LoPresto <[email protected]> wrote:
>>
>> Sven,
>>
>> Currently I would recommend using ExecuteScript and simply streaming &
>> slicing the content bytes at line 10 (a one-line operation in Groovy, I
>> believe the same in Ruby and Python).
>>
>> This isn’t the first time I’ve heard of a similar request though, so I
>> think if you were to open a Jira requesting a “GetLine(s)” or “SliceText”
>> processor, it could be valuable to the community. The current component
>> solution would probably involve SplitText/SplitContent and as you said,
>> decent overhead, especially if the desired content is early in the
>> flowfile.
>>
>> Andy LoPresto
>> [email protected]
>> *[email protected] <[email protected]>*
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> On Apr 11, 2017, at 9:38 AM, Sven Davison <[email protected]> wrote:
>>
>> I'm looking to parse some HTML. It's not the cleanest but i know that my
>> content is always on line 10 of the file. I could use splittext then
>> compare it to ensure it starts with XYZBeginningString, i supose.. but i'm
>> looking for something w/ less overhead. Especially knowing the content is
>> always on line 10.
>>
>> Anyone have other/cleaner ideas on how to get the content of line 10?
>>
>>
>>
>

Re: parsing html

Reply via email to