Hi Yolanda, Jeremy,

Thanks for your useful samples.
I will work with them to challenge a little more complex case.

Regards
Stephane
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Stephane Tinseau
iSuite Technical Specialist

Thomson Reuters

Phone: +33 1 47 62 67 72

[email protected]<mailto:[email protected]>
thomsonreuters.com<http://thomsonreuters.com/>

From: Jeremy Dyer [mailto:[email protected]]
Sent: 31 August 2016 17:28
To: [email protected]
Subject: Re: working with HTML table

Stephan - Here is another option using just the GetHTMLElement without any 
ExecuteScript processor. This uses a CSS selector to pull the elements and then 
NiFi Expression Language to split and add the values. It isn't much different 
than what you had. You were very close.

On Wed, Aug 31, 2016 at 10:06 AM, Yolanda Davis 
<[email protected]<mailto:[email protected]>> wrote:
Hi Stephane,

Here's something I hope can help.  In the GetHTMLElement instead of doing the 
selector on "table td" try "table tr"  with an output type of "Text" and a 
destination type of flowfile-content.  This should create flow files for each 
row with data and extract the numeric text from the td elements in that data.  
From there you can use the ExecuteScript processor to trim the whitespace, 
convert the text values into numbers and sum them. I was able to get this to 
work with the javascript (ECMAScript) below and using the example html you 
provided:

var flowFile = session.get();
if (flowFile != null) {

  var StreamCallback =  Java.type("org.apache.nifi.processor.io.StreamCallback")
  var IOUtils = Java.type("org.apache.commons.io.IOUtils")
  var StandardCharsets = Java.type("java.nio.charset.StandardCharsets")

  flowFile = session.write(flowFile,
    new StreamCallback(function(inputStream, outputStream) {
        var text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
        var res = text.split(" ");
        var count = 0;
        for(i in res){
        if(parseInt(res[i]) != NaN){
        count+=parseInt(res[i]);
        }
        }
        outputStream.write(count.toString().getBytes(StandardCharsets.UTF_8))
    }))
  flowFile = session.putAttribute(flowFile, "filename", flowFile.getId() + 
'_count.txt');
  session.transfer(flowFile, REL_SUCCESS)
}

I've attached the template I used to do this which hopefully can help as well.  
Please let me know if you have any questions.

Yolanda


On Wed, Aug 31, 2016 at 3:52 AM, 
<[email protected]<mailto:[email protected]>>
 wrote:
Hi All,

I’m trying to extract and doing calculation from HTML table with NIFI.
The purpose of the test if doing an addition of each TD in the same TR and 
output the result in file.
For this sample the result should be 23 and 43.

My table looks like

<table>
<tr>
          <td>11</td>
          <td>12</td>
     </tr>
     <tr>
          <td>21</td>
          <td>22</td>
     </tr>
</table>
My NIFI workflow is

InvokeHTTP > Response > GetHTMLElement > Success > PutFile

The CSS Selector for GetHTMLElement is table td.
I know that GetHTMLElement produce 0-N element but I don’t know how I can 
perform calculation of them.

All help will be grateful

Thanks
Regards
Stephane

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Stephane Tinseau

Thomson Reuters
[email protected]<mailto:[email protected]>
thomsonreuters.com<http://thomsonreuters.com/>


________________________________

This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website.<http://site.thomsonreuters.com/site/disclosures/>



--
--
[email protected]<mailto:[email protected]>
@YolandaMDavis


Reply via email to