Thanks, Adam. I can't believe I missed that note about delimiting with
semicolon. Guess I was using the same format specified in the
ExecuteProcess processor that says args are space delimited. Hmmm,
maybe there should be a change to make args handling consistent across
processors? Anyway, I tried various combinations and it still only
worked using no spaces "-n+2". I opted for a different approach anyway
now do no longer using the tail.

On Sun, Oct 25, 2015 at 1:12 AM, Adam Lamar <[email protected]> wrote:
> Mark,
>
>> If I configured the command arguments as
> "-n +2" (without the quotes and space between the two parts), the
> command would result in a "tail -n2" behavior.
>
> If you look at the tooltip for the Command Arguments property in
> ExecuteStreamCommand, you'll see that the arguments need to be delimited by
> a semicolon. Maybe try "-n;+2" instead? I'm not sure the exact rules in
> NiFi, but I've seen similar behavior with regard to spaces in libraries that
> execute processes with command line arguments.
>
> There probably is a better way to process the CSV, but I'm afraid someone
> else will need to comment on that.
>
>> Seems like it will only unzip the
> whole zip file and provide me index numbers for each file unpacked.
>
> A quick look at the UnpackContent source [1] suggests that there is no way
> to filter the filenames inside the zipfile prior to extraction. I agree that
> would be a useful feature. Maybe one of the NiFi devs will comment on the
> possibility of including it as a feature in the future.
>
> Cheers,
> Adam
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/UnpackContent.java#L304
>
>
>
> On 10/24/15 9:08 PM, Mark Petronic wrote:
>>
>> Just starting to use Nifi and built a flow that implements the following:
>>
>> unzip -p my.zip *LMTD* | tail -n +2 | gzip --fast | hdfs dfs -put -
>> /some/hdfs/file
>>
>> I used the following processor flow:
>>
>> ExecuteProcess(unzip -p) -> ExecuteStreamCommand(tail -n +2) ->
>> CompressContent(gzip) -> PutHDFS
>>
>> Couple questions/observations:
>>
>> 1. I got hung up for awhile on the ExecuteStreamCommand(tail -n +2)
>> part. I need that to strip the header line off of CSV files. I did not
>> see a simple way using a specific processor to strip off the first
>> line of a flow file. Is there a better way? But, I did notice a very
>> odd behavior of this command. If I configured the command arguments as
>> "-n +2" (without the quotes and space between the two parts), the
>> command would result in a "tail -n2" behavior. So, instead of giving
>> me all EXCEPT the first line, I only got the last 2 lines. However,
>> using "-n+2" (without the quotes and REMOVING the space) it worked as
>> expected. I believe with is confusing to the user. Both forms work
>> perfectly from the bash command line but only one works in Nifi?
>> Anyone care to comment on this? Should there be an enhancement to
>> remove this sort of inconsistent behavior?
>>
>> 2. Regarding my need to unzip ONLY one specific file from the zip
>> files (the one that matches *LMTD*), I did not see a way to do that
>> using the UnpackContent processor. Seems like it will only unzip the
>> whole zip file and provide me index numbers for each file unpacked.
>> This would be quite inefficient in my case because there are a number
>> of large files inside the zip file and I only need one. So, seems like
>> I am doing this the preferred way but, being new to Nifi, just wanted
>> to see if there are any other ideas on how to do this?
>>
>> Thanks in advance for thoughts on this
>
>

Reply via email to