Thank you for your suggestion, Andy and Lee.

I am aware of the flow using ListFile-FetchFile-HashContent. I didn’t go for 
that route because the ListFile processor does not allow upstream processor. I 
have an upstream processor, from which I know the directory I want to work 
with.  I end up to passing the directory name into the ExecuteStreamCommand 
processor to get ALL the files under the directory. After that I use SplitText 
and ExtractText to filter the files with the desired file extension, and then I 
use FetchFile and HashContent to finish what I want to do.

If ListFile allows upstream input, it would have make my data flow much easier. 
 The same goes for the ListSFTP processor.

Huagen

> 在 2016年5月31日,下午2:56,Lee Laim <[email protected]> 写道:
> 
> Huagen,
> 
> I had a similar workflow and eventually replaced ExecuteStreamCommand(md5sum) 
> with HashContent.
> 
> Using  ListFile->FetchFile->HashContent, the resultant hash is placed into 
> the flowfile under the attribute ${hash.value}.
> This processor offers ~40 algorithms to choose from, including md5.   
> Compared to the ExecuteStreamCommand, the HashContent processor offers a bit 
> more in error-handling and lineage traceability in this specific case.  
> 
> Thanks,
> -Lee
> 
> 
> On Tue, May 31, 2016 at 11:24 AM, Andy LoPresto <[email protected] 
> <mailto:[email protected]>> wrote:
> Huagen,
> 
> The ExecuteStreamCommand is used to run a command against the contents of an 
> incoming flowfile. For example, you could have a ListFile processor listing 
> all .gz files in the directory and passing them to the ExecuteStreamCommand 
> processor to generate the MD5 hash of each. In this case, you would not need 
> a wildcard character in the command. 
> 
> The configuration for the processors is as follows:
> 
> ListFile:
>       -Input directory: <the directory where the files are located>
>       -File Filter: [^\.]\.gz
> 
> ExecuteStreamCommand:
>       -Command arguments: ${filename}
>       -Command path: md5
>       -Working Directory: <the directory where the files are located>
>       -Output Destination Attribute: md5hash
> 
> Notes:
>       -I am using “md5” rather than “md5sum” as I am on Mac OS X. 
>       -You could use the “-n” flag for “md5” to suppress extraneous output
>       -You could use “${absolute.path}/${filename}” as the command arguments, 
> in which case you would not need to set the working directory
>  
> Andy LoPresto
> [email protected] <mailto:[email protected]>
> [email protected] <mailto:[email protected]>
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On May 31, 2016, at 7:02 AM, Huagen peng <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi, I would like to run a md5sum command on all the *.gz files under a 
>> certain directory.  However, I keep getting this error:
>> md5sum: stat '/tmp/transfer/16-05-22_00/*.gz': No such file or directory
>> 
>> I tried quoting the * wild character, adding a . dot or / in front with no 
>> avail.  Can I do something like this with the ExecuteStreamCommand processor?
>> 
>> Thanks.
> 
> 

Reply via email to