Hi Michael. This is a very clever approach: convert from a zip (which
UnpackContent does not preserve file metadata for extracted files) to a tar
(for which UnpackContent does preserve file metadata), then employ the
UnpackContent.

One quick followup question. The ExecuteStreamCommand will be in the nifi
flow, and so its input will be streaming incoming flowfiles, and its output
will be streamed as a flowfile. Are these two commands in the script where
we capture the incoming flowfile

cat /dev/stdin >> $tmpzipfile

...and where we create the output flowfile from the ExecuteStreamCommand
processor?

cat $tmptarfile >> /dev/stdout


On Thu, Feb 1, 2024 at 10:11 AM Michael Moser <[email protected]> wrote:

> Hi Jim,
>
> The ExecuteStreamCommand will only output 1 flowfile, so using it to unzip
> in this fashion won't yield the results you need.
>
> Instead, you might try a workaround with ExecuteStreamCommand to unzip
> your file and then tar to repackage it.  Then UnpackContent should be able
> to read the tar file metadata.  I have used ExecuteStreamCommand to execute
> bash scripts.  An example is shown below, which you can modify for your
> needs.  The ExecuteStreamCommand properties "Command Path=/bin/bash" and
> "Command Arguments=/path/to/script.sh" is all you need for this script to
> work.
>
> #!/bin/bash
> tmpzipfile=$(mktemp)
> tmptarfile=$(mktemp)
> #remove the tmptarfile file, we just need a temporary filename, and will
> recreate it below
> rm -f $tmptarfile
> #create a directory to unzip files to
> tmpdir=$(mktemp -d)
>
> cat /dev/stdin >> $tmpzipfile
> # here is your unzip command to unzip $tmpzipfile to $tmpdir, preserving
> file metadata
> # here is your tar command to tar $tmpdir to $tmptarfile
> cat $tmptarfile >> /dev/stdout
>
> #cleanup
> rm -f $tmpzipfile
> rm -f $tmptarfile
> rm -rf $tmpdir
>
>
>
> On Wed, Jan 31, 2024 at 12:55 PM James McMahon <[email protected]>
> wrote:
>
>> If anyone can show me how to get my ExecuteStreamCommand configured
>> properly as a workaround, I am still interested in that.
>> Jim
>>
>> On Wed, Jan 31, 2024 at 12:39 PM James McMahon <[email protected]>
>> wrote:
>>
>>> I tried to find a Create option for tickets here,
>>> https://issues.apache.org/jira/projects/NIFI/issues/NIFI-11859?filter=allopenissues
>>> .
>>> I did not find one, and suspect maybe I have no such privilege perhaps?
>>> In any case, thank you for creating that.
>>> Jim
>>>
>>> On Wed, Jan 31, 2024 at 12:37 PM Joe Witt <[email protected]> wrote:
>>>
>>>> I went ahead and wrote it up here
>>>> https://issues.apache.org/jira/browse/NIFI-12709
>>>>
>>>> Thanks
>>>>
>>>> On Wed, Jan 31, 2024 at 10:30 AM James McMahon <[email protected]>
>>>> wrote:
>>>>
>>>>> Happy to do that Joe. How do I create and submit a JIRA for
>>>>> consideration? I have not done one - at least, not for years.
>>>>> If you get me started, I will do a concise and thorough description in
>>>>> the ticket.
>>>>> Sincerely,
>>>>> Jim
>>>>>
>>>>> On Wed, Jan 31, 2024 at 12:12 PM Joe Witt <[email protected]> wrote:
>>>>>
>>>>>> James,
>>>>>>
>>>>>> Makes sense to create a JIRA to improve UnpackContent to extract
>>>>>> these attributes in the event of a zip file that happens to present them.
>>>>>> The concept of lastModifiedDate does appear easily accessed if available 
>>>>>> in
>>>>>> the metadata.  Owner/Creator/Creation information looks less standard in
>>>>>> the case of a Zip but perhaps still capturable as extra fields.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Wed, Jan 31, 2024 at 10:01 AM James McMahon <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I tried to use UnpackContent to extract the files within a zip file
>>>>>>> named ABC DEF (1).zip. (the filename has spaces in its name).
>>>>>>>
>>>>>>> UnpackContent seemed to work, but it did not preserve file
>>>>>>> attributes from the files in the zip. For example, the
>>>>>>> lastModifiedTime   is not available so downstream I am unable to do
>>>>>>> this: 
>>>>>>> ${file.lastModifiedTime:toDate("yyyy-MM-dd'T'HH:mm:ssZ"):format("yyyyMMddHHmmss")}
>>>>>>>
>>>>>>> I did some digging and found that on the UnpackContent page, it says:
>>>>>>> file.lastModifiedTime  "The date and time that the unpacked file
>>>>>>> was last modified (*tar only*)."
>>>>>>>
>>>>>>> I need these file attributes for those files I extract from the zip.
>>>>>>> So as an alternative I tried configuring an ExecuteStreamCommand
>>>>>>> processor like this:
>>>>>>> Command Arguments  -c;"unzip -p -q < -"
>>>>>>> Command Path  /bin/bash
>>>>>>> Argument Delimiter   ;
>>>>>>>
>>>>>>> It throws these errors:
>>>>>>>
>>>>>>> 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
>>>>>>> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Failed to
>>>>>>> write flow file to stdin due to Broken pipe: java.io.IOException: Broken
>>>>>>> pipe 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
>>>>>>> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] 
>>>>>>> Transferring
>>>>>>> flow file FlowFile[filename=ABC DEF (1).zip] to nonzero status. 
>>>>>>> Executable
>>>>>>> command /bin/bash ended in an error: /bin/bash: -: No such file or 
>>>>>>> directory
>>>>>>>
>>>>>>> It does not seem to be applying the unzip to the stdin of the ESC
>>>>>>> processor. None of the files in the zip archive are output from ESC.
>>>>>>>
>>>>>>> What needs to be changed in my ESC configuration?
>>>>>>>
>>>>>>> Thank you in advance for any help.
>>>>>>>
>>>>>>>

Reply via email to