I ran with these Command Arguments in the ExecuteStreamCommand
configuration:
x;-si;-so;-spf;-aou
${filename} removed, -si indicating use of STDIN, -so STDOUT.

The same error is thrown by 7z through ExecuteStreamCommand: Executable
command /bin/7za ended in an error: ERROR: Can not open the file as an
archive  E_NOTIMPL

I tried this at the command line, getting the same failure:
cat testArchive.7z | 7za x -si -so | dd of=stooges.txt


On Thu, Sep 29, 2022 at 6:44 AM James McMahon <[email protected]> wrote:

> Good morning, Steve. Indeed, that second paragraph is *exactly* how I did
> get this to work. I unpack to disk and then read in the twelve results
> using a GetFile. So far it is working well. It just feels a little wrong to
> me to do this, as I have introduced an extra write to and read from disk,
> which is going to be slower than doing it all in memory within the JVM.
> While that may not seem like anything significant for a single 7z file, as
> we work across thousands and thousands it can be significant.
>
> I am about to try what you suggested above: dropping the ${filename}
> entirely from the STDIN / STDOUT configuration. I realize it is not likely
> going to give me the twelve output flowfiles I'm seeking in the "output
> stream" path from ExecuteStreamCommand. I just want to see if it works
> without throwing that error.
>
> Welcome any other thoughts or comments you may have. Thanks again for your
> comments so far.
>
> Jim
>
> On Thu, Sep 29, 2022 at 5:23 AM <[email protected]> wrote:
>
>> James,
>>
>>
>>
>> I have been thinking more about your problem and this may be the wrong
>> approach. If you successfully unpack your files into the flow file content,
>> you will still have one output flow file containing the unpacked contents
>> of all of your files. If you need 12 separate files in their own flowfiles
>> then you will need to find some way of splitting them up. Is there a byte
>> sequence you can use in a SplitContent process, or a specific file length
>> you can use in SplitText?
>>
>>
>>
>> Otherwise you may be better off using ExecuteStreamCommand to unpack the
>> files on disk. Run it verbosely and use the output of that step to create a
>> list of the locations where your recently unpacked files are. Or create a
>> temporary directory to unpack in and fetch all the files in there, cleaning
>> up aftwerwards. Then you can load the files with FetchFile. FetchFile can
>> be instructed to delete the file it has just read so can also clean up
>> after itself.
>>
>>
>>
>> *Steve Hindmarch*
>>
>>
>>
>> *From:* stephen.hindmarch.bt.com via users <[email protected]>
>> *Sent:* 29 September 2022 09:19
>> *To:* [email protected]; [email protected]
>> *Subject:* RE: Can ExecuteStreamCommand do this?
>>
>>
>>
>> James,
>>
>>
>>
>> Using ${filename} and -si together seems wrong to me. What happens when
>> you try that on the command line?
>>
>>
>>
>> *Steve Hindmarch*
>>
>>
>>
>> *From:* James McMahon <[email protected]>
>> *Sent:* 28 September 2022 13:49
>> *To:* [email protected]; Hindmarch,SJ,Stephen,VIR R <
>> [email protected]>
>> *Subject:* Re: Can ExecuteStreamCommand do this?
>>
>>
>>
>> Thank you Steve. I 've employed a ListFile/FetchFile to load the 7z files
>> into the flow . When I have my ESC configured like this following, I get my
>> unpacked files results to the #{unpacked.destination} directory on disk:
>>
>> Command Arguments
>> x;${filename};-spf;-o#{unpacked.destination};-aou
>>
>> Command Path                    /bin/7a
>>
>> Ignore STDIN                       true
>>
>> Working Directory                #{unpacked.destination}
>>
>> Argument Delimiter               ;
>>
>> Output Destination Attribute  No value set
>>
>> I get twelve files in my output destination folder.
>>
>>
>>
>> When I try this one, get an error and no output:
>>
>> Command Arguments            x;${filename};-si;-so;-spf;-aou
>>
>> Command Path                    /bin/7a
>>
>> Ignore STDIN                       false
>>
>> Working Directory                #{unpacked.destination}
>>
>> Argument Delimiter               ;
>>
>> Output Destination Attribute  No value set
>>
>>
>>
>> This yields this error...
>>
>> Executable command /bin/7za ended in an error: ERROR: Can not open the
>> file as archive
>>
>> E_NOTIMPL
>>
>> ...and it yields only one flowfile result in Output Stream, and that is a
>> brief text/plain report of the results of the 7za extraction like this:
>>
>>
>>
>> This indicates it did indeed find my 7z file and it did indeed identify
>> the 12 files in it, yet still I get no output to my outgoing flow path:
>>
>> Extracting archive: /parent/subparent/testArchive.7z
>>
>> - -
>>
>> Path = /parentdir/subdir/testArchive.7z
>>
>> Type = 7z
>>
>> Physical Size = 7204
>>
>> Headers Size = 298
>>
>> Method = LZMA2:96k
>>
>> Solid = +
>>
>> Blocks = 1
>>
>>
>>
>> Everything is Ok
>>
>>
>>
>> Folders: 1
>>
>> Files: 12
>>
>> Size: 90238
>>
>> Compressed: 7204
>>
>>
>>
>> ${filename} in both cases is a fully qualified name to the file, like
>> this: /dir/subdir/myTestFile.7z.
>>
>>
>>
>> I can't seem to get the ESC output stream to be the extracted files.
>> Anything jump out at you?
>>
>>
>>
>> On Wed, Sep 28, 2022 at 8:06 AM stephen.hindmarch.bt.com
>> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fstephen.hindmarch.bt.com%2F&data=05%7C01%7Cstephen.hindmarch%40bt.com%7C54f1ea8bb7ef4ddff5d008daa1f37d7c%7Ca7f356889c004d5eba4129f146377ab0%7C0%7C0%7C638000364398005114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PKfbxWRL8NpVcjn96XkRclrXszldqA94HF1WphfQ%2BBA%3D&reserved=0>
>> via users <[email protected]> wrote:
>>
>> Hi James,
>>
>>
>>
>> I am not in a position to test this right now, but you have to think of
>> the flowfile content as STDIN and STDOUT. So with 7zip you need to use the
>> “-si” and “-so” flags to ensure there are no files involved. Then if you
>> can load the content of a file into a flowfile, eg with GetFile, then you
>> should be able to unpack it with ExecuteStreamCommand. Set “Ignore STDIN” =
>> “false”.
>>
>>
>>
>> I have written up my own use case on github. This involves having a Redis
>> script as the input, and results of the script as the output.
>>
>>
>>
>> my-nifi-cluster/experiment-redis_direct.md at main ·
>> hindmasj/my-nifi-cluster · GitHub
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhindmasj%2Fmy-nifi-cluster%2Fblob%2Fmain%2Fdocs%2Fexperiment-redis_direct.md&data=05%7C01%7Cstephen.hindmarch%40bt.com%7C54f1ea8bb7ef4ddff5d008daa1f37d7c%7Ca7f356889c004d5eba4129f146377ab0%7C0%7C0%7C638000364398005114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vHRMqJBJsrW3p7MzSXaGcGZHGmTiIzGmdgqqsTjh30E%3D&reserved=0>
>>
>>
>>
>> The first part of the post shows how to do it with the input commands on
>> the command line, so a bit like you running “7za ${filename} -so”. The
>> second part has the script inside the flowfile and is treated as STDIN, a
>> bit like you doing “unzip -si -so”.
>>
>>
>>
>> See if that helps. Fundamentally, if you do “7za -si -so < myfile.7z” on
>> the command line and see the output on the console, ExecuteStreamCommand
>> will behave the same.
>>
>>
>>
>> *Steve Hindmarch*
>>
>> *From:* James McMahon <[email protected]>
>> *Sent:* 28 September 2022 12:02
>> *To:* [email protected]
>> *Subject:* Can ExecuteStreamCommand do this?
>>
>>
>>
>> I continue to struggle with ExecuteStreamCommand, and am hoping one of
>> you from our user community can help me with the following:
>>
>> 1. Can ExecuteStreamCommand be used as I am trying to use it?
>>
>> 2. Can you direct me to an example where ExecuteStreamCommand is
>> configured to do something similar to my use case?
>>
>>
>>
>> My use case:
>>
>> The incoming flowfiles in my flow path are 7z zips. Based on what I've
>> researched so far, NiFi's native processors don't handle unpacking of 7z
>> files.
>>
>>
>>
>> I want to read the 7z files as STDIN to ExecuteStreamCommand.
>>
>> I'd like the processor to call out to a 7za app, which will unpack the
>> 7z.
>>
>> One incoming flowfile will yield multiple output files. Let's say twelve
>> in this case.
>>
>> My goal is to output those twelve as new flowfiles out of
>> ExecuteStreamCommand, to its output stream path.
>>
>>
>>
>> I can't yet get this to work. Best I've been able to do is configure
>> ExecuteStreamCommand to unpack ${filename} to a temporary output directory
>> on disk. Then I have another path in my flow polling that directory every
>> few minutes looking for new data. Am hoping to eliminate that intermediate
>> write/read to/from disk by keeping this all within the flow and JVM memory.
>>
>>
>>
>> Thanks very much in advance for any assistance.
>>
>>

Reply via email to