Huagen, I understand your issue. You can report a Jira [1] to request those processors be able to accept input, but I don’t believe that change is likely. One solution would be to extend the ListFile processor [2] as it is not a final class, and create your own “DynamicListFile” processor which accepts an incoming flowfile and populates the monitored directory from the flowfile contents. You may encounter issues with this approach if the directory changes, as the internal state maintenance of ListFile may behave unusually.
Another solution would be to use the ExecuteScript [3] processor with a small Groovy script which would accept an incoming flowfile, parse the contents to determine the desired directory, and then configure and invoke the ListFile processor directly, currying the output to a new flowfile(s). [1] https://issues.apache.org/jira/browse/NIFI/ [2] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ListFile.java [3] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-scripting-bundle/nifi-scripting-processors/src/main/java/org/apache/nifi/processors/script/ExecuteScript.java <https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-scripting-bundle/nifi-scripting-processors/src/main/java/org/apache/nifi/processors/script/ExecuteScript.java> Andy LoPresto [email protected] [email protected] PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On May 31, 2016, at 12:08 PM, Huagen peng <[email protected]> wrote: > > Thank you for your suggestion, Andy and Lee. > > I am aware of the flow using ListFile-FetchFile-HashContent. I didn’t go for > that route because the ListFile processor does not allow upstream processor. > I have an upstream processor, from which I know the directory I want to work > with. I end up to passing the directory name into the ExecuteStreamCommand > processor to get ALL the files under the directory. After that I use > SplitText and ExtractText to filter the files with the desired file > extension, and then I use FetchFile and HashContent to finish what I want to > do. > > If ListFile allows upstream input, it would have make my data flow much > easier. The same goes for the ListSFTP processor. > > Huagen > >> 在 2016年5月31日,下午2:56,Lee Laim <[email protected] >> <mailto:[email protected]>> 写道: >> >> Huagen, >> >> I had a similar workflow and eventually replaced >> ExecuteStreamCommand(md5sum) with HashContent. >> >> Using ListFile->FetchFile->HashContent, the resultant hash is placed into >> the flowfile under the attribute ${hash.value}. >> This processor offers ~40 algorithms to choose from, including md5. >> Compared to the ExecuteStreamCommand, the HashContent processor offers a bit >> more in error-handling and lineage traceability in this specific case. >> >> Thanks, >> -Lee >> >> >> On Tue, May 31, 2016 at 11:24 AM, Andy LoPresto <[email protected] >> <mailto:[email protected]>> wrote: >> Huagen, >> >> The ExecuteStreamCommand is used to run a command against the contents of an >> incoming flowfile. For example, you could have a ListFile processor listing >> all .gz files in the directory and passing them to the ExecuteStreamCommand >> processor to generate the MD5 hash of each. In this case, you would not need >> a wildcard character in the command. >> >> The configuration for the processors is as follows: >> >> ListFile: >> -Input directory: <the directory where the files are located> >> -File Filter: [^\.]\.gz >> >> ExecuteStreamCommand: >> -Command arguments: ${filename} >> -Command path: md5 >> -Working Directory: <the directory where the files are located> >> -Output Destination Attribute: md5hash >> >> Notes: >> -I am using “md5” rather than “md5sum” as I am on Mac OS X. >> -You could use the “-n” flag for “md5” to suppress extraneous output >> -You could use “${absolute.path}/${filename}” as the command arguments, >> in which case you would not need to set the working directory >> >> Andy LoPresto >> [email protected] <mailto:[email protected]> >> [email protected] <mailto:[email protected]> >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >> >>> On May 31, 2016, at 7:02 AM, Huagen peng <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi, I would like to run a md5sum command on all the *.gz files under a >>> certain directory. However, I keep getting this error: >>> md5sum: stat '/tmp/transfer/16-05-22_00/*.gz': No such file or directory >>> >>> I tried quoting the * wild character, adding a . dot or / in front with no >>> avail. Can I do something like this with the ExecuteStreamCommand >>> processor? >>> >>> Thanks. >> >> >
signature.asc
Description: Message signed with OpenPGP using GPGMail
