Thank you for your suggestion, Andy and Lee. I am aware of the flow using ListFile-FetchFile-HashContent. I didn’t go for that route because the ListFile processor does not allow upstream processor. I have an upstream processor, from which I know the directory I want to work with. I end up to passing the directory name into the ExecuteStreamCommand processor to get ALL the files under the directory. After that I use SplitText and ExtractText to filter the files with the desired file extension, and then I use FetchFile and HashContent to finish what I want to do.
If ListFile allows upstream input, it would have make my data flow much easier. The same goes for the ListSFTP processor. Huagen > 在 2016年5月31日,下午2:56,Lee Laim <[email protected]> 写道: > > Huagen, > > I had a similar workflow and eventually replaced ExecuteStreamCommand(md5sum) > with HashContent. > > Using ListFile->FetchFile->HashContent, the resultant hash is placed into > the flowfile under the attribute ${hash.value}. > This processor offers ~40 algorithms to choose from, including md5. > Compared to the ExecuteStreamCommand, the HashContent processor offers a bit > more in error-handling and lineage traceability in this specific case. > > Thanks, > -Lee > > > On Tue, May 31, 2016 at 11:24 AM, Andy LoPresto <[email protected] > <mailto:[email protected]>> wrote: > Huagen, > > The ExecuteStreamCommand is used to run a command against the contents of an > incoming flowfile. For example, you could have a ListFile processor listing > all .gz files in the directory and passing them to the ExecuteStreamCommand > processor to generate the MD5 hash of each. In this case, you would not need > a wildcard character in the command. > > The configuration for the processors is as follows: > > ListFile: > -Input directory: <the directory where the files are located> > -File Filter: [^\.]\.gz > > ExecuteStreamCommand: > -Command arguments: ${filename} > -Command path: md5 > -Working Directory: <the directory where the files are located> > -Output Destination Attribute: md5hash > > Notes: > -I am using “md5” rather than “md5sum” as I am on Mac OS X. > -You could use the “-n” flag for “md5” to suppress extraneous output > -You could use “${absolute.path}/${filename}” as the command arguments, > in which case you would not need to set the working directory > > Andy LoPresto > [email protected] <mailto:[email protected]> > [email protected] <mailto:[email protected]> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > >> On May 31, 2016, at 7:02 AM, Huagen peng <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi, I would like to run a md5sum command on all the *.gz files under a >> certain directory. However, I keep getting this error: >> md5sum: stat '/tmp/transfer/16-05-22_00/*.gz': No such file or directory >> >> I tried quoting the * wild character, adding a . dot or / in front with no >> avail. Can I do something like this with the ExecuteStreamCommand processor? >> >> Thanks. > >
