I'm looking to process many files into common formats.  The source files
are coming in various character sets, mime types, and new line terminators.

My thinking for a data flow was along these lines:

GetFile (from many sub directories) ->
ExecuteStreamCommand (file -i) ->
ConvertCharacterSet (from previous command to utf8) ->
ReplaceText (to change any \r\n into \n) ->
PutFile (into a directory structure based on values found in the original
file path and filename)

Additional steps would be added for archiving a copy of the original,
converting xml files, etc.

Attempting to process these with Nifi leaves me confused as to how to
process within the tool.  If I want to ConvertCharacterSet, I have to know
the input type.  I setup a ExecuteStreamCommand to file -i
${absolute.path:append(${filename})} which returned the expected values.  I
don't see a way to turn these results into input for the processor, which
doesn't accept expression language for that field.

I also considered ConvertCSVToAvro as an interim step but notice the same
issue.  Any suggestions what this dataflow should look like?

Charlie

Reply via email to