I don't know what your use case is, but we avoid anything beyond gzip because S3 is so cheap.
On Thu, Sep 29, 2022 at 10:51 AM James McMahon <[email protected]> wrote: > > Thank you Mark. Had no idea there was this file-based dependency to 7z files. > Since my workaround appears to be working I think I may just move forward > with that. > Steve, Mark - thank you again for replying. > Jim > > On Thu, Sep 29, 2022 at 9:15 AM Mark Payne <[email protected]> wrote: >> >> It’s been a while. But if I remember correctly, the reason that NiFi does >> not natively support 7-zip format is that with 7-zip, the dictionary is >> written at the end of the file. >> So when data is compressed, the dictionary is built up during compression >> and written at the end. This makes sense from a compression standpoint. >> However, what it means, is that in order to decompress it, you must first >> jump to the end of the file in order to access the dictionary. Then jump >> back to the beginning of the file in order to perform the decompression. >> NiFi makes use of Input Streams and Output Streams for FlowFIle access - it >> doesn’t provide a File-based approach. And this ability to jump to the end, >> read the dictionary, and then jump back to the beginning isn’t really >> possible with Input/Output Streams - at least, not without buffering >> everything into memory. >> >> So it would make sense that there would be a “Not Implemented” error when >> attempting to do the same thing using the 7-zip application directly, when >> attempting to use input streams & output streams. >> I think that if you’re stuck with 7-zip, your own option will be to do what >> you’re doing - write the data out as a file, run the 7-zip application >> against that file, writing the output to some directory, and then picking up >> the files from that directory. >> The alternative, of course, would be to update the source so that it’s >> creating zip files instead of 7-zip files, if you have sway over the source >> producer. >> >> Thanks >> -Mark >> >> >> On Sep 29, 2022, at 8:58 AM, stephen.hindmarch.bt.com via users >> <[email protected]> wrote: >> >> James, >> >> E_NOTIMPL means that feature is not implemented. I can see there is >> discussion about this down at sourceforge but the detail is blocked by my >> employer’s firewall. >> >> p7zip / Discussion / Help: E_NOTIMPL for stdin / stdout pipe >> >> https://sourceforge.net/p/p7zip/discussion/383044/thread/8066736d >> >> Steve Hindmarch >> >> From: James McMahon <[email protected]> >> Sent: 29 September 2022 12:12 >> To: Hindmarch,SJ,Stephen,VIR R <[email protected]> >> Cc: [email protected] >> Subject: Re: Can ExecuteStreamCommand do this? >> >> I ran with these Command Arguments in the ExecuteStreamCommand configuration: >> x;-si;-so;-spf;-aou >> ${filename} removed, -si indicating use of STDIN, -so STDOUT. >> >> The same error is thrown by 7z through ExecuteStreamCommand: Executable >> command /bin/7za ended in an error: ERROR: Can not open the file as an >> archive E_NOTIMPL >> >> I tried this at the command line, getting the same failure: >> cat testArchive.7z | 7za x -si -so | dd of=stooges.txt >> >> >> On Thu, Sep 29, 2022 at 6:44 AM James McMahon <[email protected]> wrote: >> >> Good morning, Steve. Indeed, that second paragraph is exactly how I did get >> this to work. I unpack to disk and then read in the twelve results using a >> GetFile. So far it is working well. It just feels a little wrong to me to do >> this, as I have introduced an extra write to and read from disk, which is >> going to be slower than doing it all in memory within the JVM. While that >> may not seem like anything significant for a single 7z file, as we work >> across thousands and thousands it can be significant. >> >> I am about to try what you suggested above: dropping the ${filename} >> entirely from the STDIN / STDOUT configuration. I realize it is not likely >> going to give me the twelve output flowfiles I'm seeking in the "output >> stream" path from ExecuteStreamCommand. I just want to see if it works >> without throwing that error. >> >> Welcome any other thoughts or comments you may have. Thanks again for your >> comments so far. >> >> Jim >> >> On Thu, Sep 29, 2022 at 5:23 AM <[email protected]> wrote: >> >> James, >> >> I have been thinking more about your problem and this may be the wrong >> approach. If you successfully unpack your files into the flow file content, >> you will still have one output flow file containing the unpacked contents of >> all of your files. If you need 12 separate files in their own flowfiles then >> you will need to find some way of splitting them up. Is there a byte >> sequence you can use in a SplitContent process, or a specific file length >> you can use in SplitText? >> >> Otherwise you may be better off using ExecuteStreamCommand to unpack the >> files on disk. Run it verbosely and use the output of that step to create a >> list of the locations where your recently unpacked files are. Or create a >> temporary directory to unpack in and fetch all the files in there, cleaning >> up aftwerwards. Then you can load the files with FetchFile. FetchFile can be >> instructed to delete the file it has just read so can also clean up after >> itself. >> >> Steve Hindmarch >> >> From: stephen.hindmarch.bt.com via users <[email protected]> >> Sent: 29 September 2022 09:19 >> To: [email protected]; [email protected] >> Subject: RE: Can ExecuteStreamCommand do this? >> >> James, >> >> Using ${filename} and -si together seems wrong to me. What happens when you >> try that on the command line? >> >> Steve Hindmarch >> >> From: James McMahon <[email protected]> >> Sent: 28 September 2022 13:49 >> To: [email protected]; Hindmarch,SJ,Stephen,VIR R >> <[email protected]> >> Subject: Re: Can ExecuteStreamCommand do this? >> >> Thank you Steve. I 've employed a ListFile/FetchFile to load the 7z files >> into the flow . When I have my ESC configured like this following, I get my >> unpacked files results to the #{unpacked.destination} directory on disk: >> Command Arguments >> x;${filename};-spf;-o#{unpacked.destination};-aou >> Command Path /bin/7a >> Ignore STDIN true >> Working Directory #{unpacked.destination} >> Argument Delimiter ; >> Output Destination Attribute No value set >> I get twelve files in my output destination folder. >> >> When I try this one, get an error and no output: >> Command Arguments x;${filename};-si;-so;-spf;-aou >> Command Path /bin/7a >> Ignore STDIN false >> Working Directory #{unpacked.destination} >> Argument Delimiter ; >> Output Destination Attribute No value set >> >> This yields this error... >> Executable command /bin/7za ended in an error: ERROR: Can not open the file >> as archive >> E_NOTIMPL >> ...and it yields only one flowfile result in Output Stream, and that is a >> brief text/plain report of the results of the 7za extraction like this: >> >> This indicates it did indeed find my 7z file and it did indeed identify the >> 12 files in it, yet still I get no output to my outgoing flow path: >> Extracting archive: /parent/subparent/testArchive.7z >> - - >> Path = /parentdir/subdir/testArchive.7z >> Type = 7z >> Physical Size = 7204 >> Headers Size = 298 >> Method = LZMA2:96k >> Solid = + >> Blocks = 1 >> >> Everything is Ok >> >> Folders: 1 >> Files: 12 >> Size: 90238 >> Compressed: 7204 >> >> ${filename} in both cases is a fully qualified name to the file, like this: >> /dir/subdir/myTestFile.7z. >> >> I can't seem to get the ESC output stream to be the extracted files. >> Anything jump out at you? >> >> On Wed, Sep 28, 2022 at 8:06 AM stephen.hindmarch.bt.com via users >> <[email protected]> wrote: >> >> Hi James, >> >> I am not in a position to test this right now, but you have to think of the >> flowfile content as STDIN and STDOUT. So with 7zip you need to use the “-si” >> and “-so” flags to ensure there are no files involved. Then if you can load >> the content of a file into a flowfile, eg with GetFile, then you should be >> able to unpack it with ExecuteStreamCommand. Set “Ignore STDIN” = “false”. >> >> I have written up my own use case on github. This involves having a Redis >> script as the input, and results of the script as the output. >> >> my-nifi-cluster/experiment-redis_direct.md at main · >> hindmasj/my-nifi-cluster · GitHub >> >> The first part of the post shows how to do it with the input commands on the >> command line, so a bit like you running “7za ${filename} -so”. The second >> part has the script inside the flowfile and is treated as STDIN, a bit like >> you doing “unzip -si -so”. >> >> See if that helps. Fundamentally, if you do “7za -si -so < myfile.7z” on the >> command line and see the output on the console, ExecuteStreamCommand will >> behave the same. >> >> Steve Hindmarch >> From: James McMahon <[email protected]> >> Sent: 28 September 2022 12:02 >> To: [email protected] >> Subject: Can ExecuteStreamCommand do this? >> >> I continue to struggle with ExecuteStreamCommand, and am hoping one of you >> from our user community can help me with the following: >> 1. Can ExecuteStreamCommand be used as I am trying to use it? >> 2. Can you direct me to an example where ExecuteStreamCommand is configured >> to do something similar to my use case? >> >> My use case: >> The incoming flowfiles in my flow path are 7z zips. Based on what I've >> researched so far, NiFi's native processors don't handle unpacking of 7z >> files. >> >> I want to read the 7z files as STDIN to ExecuteStreamCommand. >> I'd like the processor to call out to a 7za app, which will unpack the 7z. >> One incoming flowfile will yield multiple output files. Let's say twelve in >> this case. >> My goal is to output those twelve as new flowfiles out of >> ExecuteStreamCommand, to its output stream path. >> >> I can't yet get this to work. Best I've been able to do is configure >> ExecuteStreamCommand to unpack ${filename} to a temporary output directory >> on disk. Then I have another path in my flow polling that directory every >> few minutes looking for new data. Am hoping to eliminate that intermediate >> write/read to/from disk by keeping this all within the flow and JVM memory. >> >> Thanks very much in advance for any assistance. >> >>
