Mark,

> If I configured the command arguments as
"-n +2" (without the quotes and space between the two parts), the
command would result in a "tail -n2" behavior.

If you look at the tooltip for the Command Arguments property in ExecuteStreamCommand, you'll see that the arguments need to be delimited by a semicolon. Maybe try "-n;+2" instead? I'm not sure the exact rules in NiFi, but I've seen similar behavior with regard to spaces in libraries that execute processes with command line arguments.

There probably is a better way to process the CSV, but I'm afraid someone else will need to comment on that.

> Seems like it will only unzip the
whole zip file and provide me index numbers for each file unpacked.

A quick look at the UnpackContent source [1] suggests that there is no way to filter the filenames inside the zipfile prior to extraction. I agree that would be a useful feature. Maybe one of the NiFi devs will comment on the possibility of including it as a feature in the future.

Cheers,
Adam


[1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/UnpackContent.java#L304


On 10/24/15 9:08 PM, Mark Petronic wrote:
Just starting to use Nifi and built a flow that implements the following:

unzip -p my.zip *LMTD* | tail -n +2 | gzip --fast | hdfs dfs -put -
/some/hdfs/file

I used the following processor flow:

ExecuteProcess(unzip -p) -> ExecuteStreamCommand(tail -n +2) ->
CompressContent(gzip) -> PutHDFS

Couple questions/observations:

1. I got hung up for awhile on the ExecuteStreamCommand(tail -n +2)
part. I need that to strip the header line off of CSV files. I did not
see a simple way using a specific processor to strip off the first
line of a flow file. Is there a better way? But, I did notice a very
odd behavior of this command. If I configured the command arguments as
"-n +2" (without the quotes and space between the two parts), the
command would result in a "tail -n2" behavior. So, instead of giving
me all EXCEPT the first line, I only got the last 2 lines. However,
using "-n+2" (without the quotes and REMOVING the space) it worked as
expected. I believe with is confusing to the user. Both forms work
perfectly from the bash command line but only one works in Nifi?
Anyone care to comment on this? Should there be an enhancement to
remove this sort of inconsistent behavior?

2. Regarding my need to unzip ONLY one specific file from the zip
files (the one that matches *LMTD*), I did not see a way to do that
using the UnpackContent processor. Seems like it will only unzip the
whole zip file and provide me index numbers for each file unpacked.
This would be quite inefficient in my case because there are a number
of large files inside the zip file and I only need one. So, seems like
I am doing this the preferred way but, being new to Nifi, just wanted
to see if there are any other ideas on how to do this?

Thanks in advance for thoughts on this

Reply via email to