One problem with the above flow is that ExecuteStreamCommand will replace
the contents of the FlowFile with the results of the command, so the
FlowFIle will have the encoding value and no longer have the original
content.

This could potentially be solved in the future with the "hold file"
processor [1] where the original file is held on one path, while the same
file goes to ExecuteStreamCommand, after getting the encoding it could be
extracted to an attribute and then trigger the original file for release,
copying over the encoding attribute.

[1] https://issues.apache.org/jira/browse/NIFI-190



On Tue, Oct 27, 2015 at 10:24 AM, Joe Percivall <[email protected]>
wrote:

> Hey Charlie,
>
> Sorry no one has followed up with you yet. One way I see around
> ConvertCharacterSet not supporting expression language is to route on
> attribute (assuming the character set is extracted to be an attribute) to
> different ConvertCharacterSet processors depending on the input character
> set.
>
> That being said, I don't see a reason why the ConvertCharacterSet
> shouldn't support expression language. If anyone doesn't have objections
> I'll put in a ticket later today and knock it out real quick.
>
>
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: [email protected]
>
>
>
>
> On Sunday, October 25, 2015 7:13 PM, Charlie Frasure <
> [email protected]> wrote:
>
>
>
> I'm looking to process many files into common formats.  The source files
> are coming in various character sets, mime types, and new line terminators.
>
> My thinking for a data flow was along these lines:
>
> GetFile (from many sub directories) ->
> ExecuteStreamCommand (file -i) ->
> ConvertCharacterSet (from previous command to utf8) ->
> ReplaceText (to change any \r\n into \n) ->
> PutFile (into a directory structure based on values found in the original
> file path and filename)
>
> Additional steps would be added for archiving a copy of the original,
> converting xml files, etc.
>
> Attempting to process these with Nifi leaves me confused as to how to
> process within the tool.  If I want to ConvertCharacterSet, I have to know
> the input type.  I setup a ExecuteStreamCommand to file -i
> ${absolute.path:append(${filename})} which returned the expected values.  I
> don't see a way to turn these results into input for the processor, which
> doesn't accept expression language for that field.
>
> I also considered ConvertCSVToAvro as an interim step but notice the same
> issue.  Any suggestions what this dataflow should look like?
>
>
> Charlie
>

Reply via email to