I ran with these Command Arguments in the ExecuteStreamCommand
configuration:
x;-si;-so;-spf;-aou
${filename} removed, -si indicating use of STDIN, -so STDOUT.The same error is thrown by 7z through ExecuteStreamCommand: Executable command /bin/7za ended in an error: ERROR: Can not open the file as an archive E_NOTIMPL I tried this at the command line, getting the same failure: cat testArchive.7z | 7za x -si -so | dd of=stooges.txt On Thu, Sep 29, 2022 at 6:44 AM James McMahon <[email protected]> wrote: > Good morning, Steve. Indeed, that second paragraph is *exactly* how I did > get this to work. I unpack to disk and then read in the twelve results > using a GetFile. So far it is working well. It just feels a little wrong to > me to do this, as I have introduced an extra write to and read from disk, > which is going to be slower than doing it all in memory within the JVM. > While that may not seem like anything significant for a single 7z file, as > we work across thousands and thousands it can be significant. > > I am about to try what you suggested above: dropping the ${filename} > entirely from the STDIN / STDOUT configuration. I realize it is not likely > going to give me the twelve output flowfiles I'm seeking in the "output > stream" path from ExecuteStreamCommand. I just want to see if it works > without throwing that error. > > Welcome any other thoughts or comments you may have. Thanks again for your > comments so far. > > Jim > > On Thu, Sep 29, 2022 at 5:23 AM <[email protected]> wrote: > >> James, >> >> >> >> I have been thinking more about your problem and this may be the wrong >> approach. If you successfully unpack your files into the flow file content, >> you will still have one output flow file containing the unpacked contents >> of all of your files. If you need 12 separate files in their own flowfiles >> then you will need to find some way of splitting them up. Is there a byte >> sequence you can use in a SplitContent process, or a specific file length >> you can use in SplitText? >> >> >> >> Otherwise you may be better off using ExecuteStreamCommand to unpack the >> files on disk. Run it verbosely and use the output of that step to create a >> list of the locations where your recently unpacked files are. Or create a >> temporary directory to unpack in and fetch all the files in there, cleaning >> up aftwerwards. Then you can load the files with FetchFile. FetchFile can >> be instructed to delete the file it has just read so can also clean up >> after itself. >> >> >> >> *Steve Hindmarch* >> >> >> >> *From:* stephen.hindmarch.bt.com via users <[email protected]> >> *Sent:* 29 September 2022 09:19 >> *To:* [email protected]; [email protected] >> *Subject:* RE: Can ExecuteStreamCommand do this? >> >> >> >> James, >> >> >> >> Using ${filename} and -si together seems wrong to me. What happens when >> you try that on the command line? >> >> >> >> *Steve Hindmarch* >> >> >> >> *From:* James McMahon <[email protected]> >> *Sent:* 28 September 2022 13:49 >> *To:* [email protected]; Hindmarch,SJ,Stephen,VIR R < >> [email protected]> >> *Subject:* Re: Can ExecuteStreamCommand do this? >> >> >> >> Thank you Steve. I 've employed a ListFile/FetchFile to load the 7z files >> into the flow . When I have my ESC configured like this following, I get my >> unpacked files results to the #{unpacked.destination} directory on disk: >> >> Command Arguments >> x;${filename};-spf;-o#{unpacked.destination};-aou >> >> Command Path /bin/7a >> >> Ignore STDIN true >> >> Working Directory #{unpacked.destination} >> >> Argument Delimiter ; >> >> Output Destination Attribute No value set >> >> I get twelve files in my output destination folder. >> >> >> >> When I try this one, get an error and no output: >> >> Command Arguments x;${filename};-si;-so;-spf;-aou >> >> Command Path /bin/7a >> >> Ignore STDIN false >> >> Working Directory #{unpacked.destination} >> >> Argument Delimiter ; >> >> Output Destination Attribute No value set >> >> >> >> This yields this error... >> >> Executable command /bin/7za ended in an error: ERROR: Can not open the >> file as archive >> >> E_NOTIMPL >> >> ...and it yields only one flowfile result in Output Stream, and that is a >> brief text/plain report of the results of the 7za extraction like this: >> >> >> >> This indicates it did indeed find my 7z file and it did indeed identify >> the 12 files in it, yet still I get no output to my outgoing flow path: >> >> Extracting archive: /parent/subparent/testArchive.7z >> >> - - >> >> Path = /parentdir/subdir/testArchive.7z >> >> Type = 7z >> >> Physical Size = 7204 >> >> Headers Size = 298 >> >> Method = LZMA2:96k >> >> Solid = + >> >> Blocks = 1 >> >> >> >> Everything is Ok >> >> >> >> Folders: 1 >> >> Files: 12 >> >> Size: 90238 >> >> Compressed: 7204 >> >> >> >> ${filename} in both cases is a fully qualified name to the file, like >> this: /dir/subdir/myTestFile.7z. >> >> >> >> I can't seem to get the ESC output stream to be the extracted files. >> Anything jump out at you? >> >> >> >> On Wed, Sep 28, 2022 at 8:06 AM stephen.hindmarch.bt.com >> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fstephen.hindmarch.bt.com%2F&data=05%7C01%7Cstephen.hindmarch%40bt.com%7C54f1ea8bb7ef4ddff5d008daa1f37d7c%7Ca7f356889c004d5eba4129f146377ab0%7C0%7C0%7C638000364398005114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PKfbxWRL8NpVcjn96XkRclrXszldqA94HF1WphfQ%2BBA%3D&reserved=0> >> via users <[email protected]> wrote: >> >> Hi James, >> >> >> >> I am not in a position to test this right now, but you have to think of >> the flowfile content as STDIN and STDOUT. So with 7zip you need to use the >> “-si” and “-so” flags to ensure there are no files involved. Then if you >> can load the content of a file into a flowfile, eg with GetFile, then you >> should be able to unpack it with ExecuteStreamCommand. Set “Ignore STDIN” = >> “false”. >> >> >> >> I have written up my own use case on github. This involves having a Redis >> script as the input, and results of the script as the output. >> >> >> >> my-nifi-cluster/experiment-redis_direct.md at main · >> hindmasj/my-nifi-cluster · GitHub >> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhindmasj%2Fmy-nifi-cluster%2Fblob%2Fmain%2Fdocs%2Fexperiment-redis_direct.md&data=05%7C01%7Cstephen.hindmarch%40bt.com%7C54f1ea8bb7ef4ddff5d008daa1f37d7c%7Ca7f356889c004d5eba4129f146377ab0%7C0%7C0%7C638000364398005114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vHRMqJBJsrW3p7MzSXaGcGZHGmTiIzGmdgqqsTjh30E%3D&reserved=0> >> >> >> >> The first part of the post shows how to do it with the input commands on >> the command line, so a bit like you running “7za ${filename} -so”. The >> second part has the script inside the flowfile and is treated as STDIN, a >> bit like you doing “unzip -si -so”. >> >> >> >> See if that helps. Fundamentally, if you do “7za -si -so < myfile.7z” on >> the command line and see the output on the console, ExecuteStreamCommand >> will behave the same. >> >> >> >> *Steve Hindmarch* >> >> *From:* James McMahon <[email protected]> >> *Sent:* 28 September 2022 12:02 >> *To:* [email protected] >> *Subject:* Can ExecuteStreamCommand do this? >> >> >> >> I continue to struggle with ExecuteStreamCommand, and am hoping one of >> you from our user community can help me with the following: >> >> 1. Can ExecuteStreamCommand be used as I am trying to use it? >> >> 2. Can you direct me to an example where ExecuteStreamCommand is >> configured to do something similar to my use case? >> >> >> >> My use case: >> >> The incoming flowfiles in my flow path are 7z zips. Based on what I've >> researched so far, NiFi's native processors don't handle unpacking of 7z >> files. >> >> >> >> I want to read the 7z files as STDIN to ExecuteStreamCommand. >> >> I'd like the processor to call out to a 7za app, which will unpack the >> 7z. >> >> One incoming flowfile will yield multiple output files. Let's say twelve >> in this case. >> >> My goal is to output those twelve as new flowfiles out of >> ExecuteStreamCommand, to its output stream path. >> >> >> >> I can't yet get this to work. Best I've been able to do is configure >> ExecuteStreamCommand to unpack ${filename} to a temporary output directory >> on disk. Then I have another path in my flow polling that directory every >> few minutes looking for new data. Am hoping to eliminate that intermediate >> write/read to/from disk by keeping this all within the flow and JVM memory. >> >> >> >> Thanks very much in advance for any assistance. >> >>
