Hey Kiran,

It is possible to do in NIFI using ExecuteProcess processor.
I would implement it as following:
1. Get the file from FS
2. Route on filename extension (tar.gz, zip, rar, etc). when you create
relationships by adding expressions, you can use for example:
${filename:matches('.+\.tar.gz')}
3. use ExecuteProcess to run shell command that will list archived files
(tar -tf file.tar.gz, or unzip -l, or unrar -l).
4. Analyze the content of modified FF with the logic you have for a
filename and finally
5. Unarchive and continue your flow, or stop the flow

I hope I understood your requirements correctly and that will work for you.

Regards,
Ed.

On Thu, Mar 1, 2018 at 5:00 PM Kiran <kiran....@protonmail.com> wrote:

> Hello,
>
> I've got a NiFi flow which:
> 1. Ingest archive files (tar.gz, rar and zip)
> 2. IdentifyMimeType of the archive
> 3. UnpackContent of the archive
> 4. Identify which of the files can be processed based on *filename*
>
> The problem I've got is that a lot of processing time/content repo space
> is wasted by extracting the archive files and realising that I can't
> process the file based on the filename.
>
> I was wondering if there was any way of getting a list of the filenames
> within the archive without actually extracting the files? Based on the
> filenames I can then decide if I should unpack the archive or not.
>
> Kiran
>
>
>

Reply via email to