Huagen,

Here is an example [1] which does what you are asking. This is a quick hack, 
and a better option is probably to use InvokeScriptedProcessor [2], which is 
explained well by Matt Burgess on his blog [3][4]. However, with this method, 
you do not need to modify the internal code of NiFi at all. You can simply drop 
the contents of testInvokeListFileProcessor.groovy into the ExecuteScript 
processor (or reference the external file), configure the properties from the 
test, and run.

Quick overview:

The test (independently) lists the files in a directory for comparison later, 
sets up an ExecuteScript processor, configures the necessary properties, sends 
an incoming flowfile with the directory path as content and file filter as an 
attribute, and executes the processor.

The script consumes the incoming flowfile, extracts the directory path from the 
content, extracts the file filter from the attribute, sets up a few more 
hard-coded values (like min/max size and age), and then invokes ListFile and 
returns the massaged output as a new flowfile.

Again, this is a bit hacky, but it accomplishes what you are asking for. As I 
said above, for a production system I would recommend that you write a custom 
processor using InvokeScriptedProcessor which does something similar (and 
doesn’t rely on mocking so much of the framework to interact with ListFile).

[1] 
https://github.com/apache/nifi/compare/master...alopresto:groovyListFileDemo?expand=1
 
<https://github.com/apache/nifi/compare/master...alopresto:groovyListFileDemo?expand=1>
[2] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.script.InvokeScriptedProcessor/index.html
[3] 
http://funnifi.blogspot.com/2016/02/invokescriptedprocessor-hello-world.html 
<http://funnifi.blogspot.com/2016/02/invokescriptedprocessor-hello-world.html>
[4] 
http://funnifi.blogspot.com/2016/02/writing-reusable-scripted-processors-in.html
 
<http://funnifi.blogspot.com/2016/02/writing-reusable-scripted-processors-in.html>


Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On May 31, 2016, at 4:11 PM, Huagen peng <[email protected]> wrote:
> 
> Andy,
> 
> Could you please explain how to invoke a ListFile processor from the 
> ExecuteScript processor?  Is it an API call?
> 
> Huagen
> 
>> 在 2016年5月31日,下午3:23,Andy LoPresto <[email protected] 
>> <mailto:[email protected]>> 写道:
>> 
>> Huagen,
>> 
>> I understand your issue. You can report a Jira [1] to request those 
>> processors be able to accept input, but I don’t believe that change is 
>> likely. One solution would be to extend the ListFile processor [2] as it is 
>> not a final class, and create your own “DynamicListFile” processor which 
>> accepts an incoming flowfile and populates the monitored directory from the 
>> flowfile contents. You may encounter issues with this approach if the 
>> directory changes, as the internal state maintenance of ListFile may behave 
>> unusually.
>> 
>> Another solution would be to use the ExecuteScript [3] processor with a 
>> small Groovy script which would accept an incoming flowfile, parse the 
>> contents to determine the desired directory, and then configure and invoke 
>> the ListFile processor directly, currying the output to a new flowfile(s).
>> 
>> [1] https://issues.apache.org/jira/browse/NIFI/ 
>> <https://issues.apache.org/jira/browse/NIFI/>
>> [2] 
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ListFile.java
>>  
>> <https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ListFile.java>
>> [3] 
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-scripting-bundle/nifi-scripting-processors/src/main/java/org/apache/nifi/processors/script/ExecuteScript.java
>>  
>> <https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-scripting-bundle/nifi-scripting-processors/src/main/java/org/apache/nifi/processors/script/ExecuteScript.java>
>> 
>> 
>> 
>> Andy LoPresto
>> [email protected] <mailto:[email protected]>
>> [email protected] <mailto:[email protected]>
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>> 
>>> On May 31, 2016, at 12:08 PM, Huagen peng <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Thank you for your suggestion, Andy and Lee.
>>> 
>>> I am aware of the flow using ListFile-FetchFile-HashContent. I didn’t go 
>>> for that route because the ListFile processor does not allow upstream 
>>> processor. I have an upstream processor, from which I know the directory I 
>>> want to work with.  I end up to passing the directory name into the 
>>> ExecuteStreamCommand processor to get ALL the files under the directory. 
>>> After that I use SplitText and ExtractText to filter the files with the 
>>> desired file extension, and then I use FetchFile and HashContent to finish 
>>> what I want to do.
>>> 
>>> If ListFile allows upstream input, it would have make my data flow much 
>>> easier.  The same goes for the ListSFTP processor.
>>> 
>>> Huagen
>>> 
>>>> 在 2016年5月31日,下午2:56,Lee Laim <[email protected] 
>>>> <mailto:[email protected]>> 写道:
>>>> 
>>>> Huagen,
>>>> 
>>>> I had a similar workflow and eventually replaced 
>>>> ExecuteStreamCommand(md5sum) with HashContent.
>>>> 
>>>> Using  ListFile->FetchFile->HashContent, the resultant hash is placed into 
>>>> the flowfile under the attribute ${hash.value}.
>>>> This processor offers ~40 algorithms to choose from, including md5.   
>>>> Compared to the ExecuteStreamCommand, the HashContent processor offers a 
>>>> bit more in error-handling and lineage traceability in this specific case.
>>>> 
>>>> Thanks,
>>>> -Lee
>>>> 
>>>> 
>>>> On Tue, May 31, 2016 at 11:24 AM, Andy LoPresto <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> Huagen,
>>>> 
>>>> The ExecuteStreamCommand is used to run a command against the contents of 
>>>> an incoming flowfile. For example, you could have a ListFile processor 
>>>> listing all .gz files in the directory and passing them to the 
>>>> ExecuteStreamCommand processor to generate the MD5 hash of each. In this 
>>>> case, you would not need a wildcard character in the command.
>>>> 
>>>> The configuration for the processors is as follows:
>>>> 
>>>> ListFile:
>>>>    -Input directory: <the directory where the files are located>
>>>>    -File Filter: [^\.]\.gz
>>>> 
>>>> ExecuteStreamCommand:
>>>>    -Command arguments: ${filename}
>>>>    -Command path: md5
>>>>    -Working Directory: <the directory where the files are located>
>>>>    -Output Destination Attribute: md5hash
>>>> 
>>>> Notes:
>>>>    -I am using “md5” rather than “md5sum” as I am on Mac OS X.
>>>>    -You could use the “-n” flag for “md5” to suppress extraneous output
>>>>    -You could use “${absolute.path}/${filename}” as the command arguments, 
>>>> in which case you would not need to set the working directory
>>>> 
>>>> Andy LoPresto
>>>> [email protected] <mailto:[email protected]>
>>>> [email protected] <mailto:[email protected]>
>>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>>> 
>>>>> On May 31, 2016, at 7:02 AM, Huagen peng <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Hi, I would like to run a md5sum command on all the *.gz files under a 
>>>>> certain directory.  However, I keep getting this error:
>>>>> md5sum: stat '/tmp/transfer/16-05-22_00/*.gz': No such file or directory
>>>>> 
>>>>> I tried quoting the * wild character, adding a . dot or / in front with 
>>>>> no avail.  Can I do something like this with the ExecuteStreamCommand 
>>>>> processor?
>>>>> 
>>>>> Thanks.
>>>> 
>>>> 
>>> 
>> 
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to