Re: ConvertCharacterSet

Charlie Frasure Wed, 28 Oct 2015 08:47:17 -0700

I saw the patch you added for NIFI-1077.  Thanks!  Do you plan to add an
issue for the ExecuteStreamCommand output, or should I be looking into
NIFI-190 that Bryan mentioned?


On Tue, Oct 27, 2015 at 5:30 PM, Joe Percivall <[email protected]>
wrote:

> No one responded with concerns regarding allowing expression language for
> the input/output character set so I created a jira [1]. This use-case is
> something that should be easy for NiFi and the flow for this use-case is
> definitely more of a hack job than it should be.
>
> Does anyone have objections for adding a configuration option to put the
> output of
> ExecuteStreamCommand to an attribute instead of the FlowFile contents?
>
> [1] https://issues.apache.org/jira/browse/NIFI-1077
>
>
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: [email protected]
>
>
>
>
> On Tuesday, October 27, 2015 5:15 PM, Charlie Frasure <
> [email protected]> wrote:
>
>
>
> Thank you both for the replies.  I built a flow that adds the "fragment"
> attributes early on, and splits the feed after the ExecuteStream that
> identifies the character set.  The character set payload goes through
> ExtractText to move it into an attribute and ReplaceText to delete the
> contents of the file.  The two streams are then funneled to a MergeContent
> using Defragment, which results in the original data with an extra blank
> line and the character set attribute attached.
>
> I suppose at this point I could route based on attributes for each
> character set or call another ExecuteStream to iconv.  This works, but
> seems a bit of a hack job.  Any suggestions for improvement?  Is this an
> expected use case for the tool?
>
>
> On Tue, Oct 27, 2015 at 10:45 AM, Bryan Bende <[email protected]> wrote:
>
> One problem with the above flow is that ExecuteStreamCommand will replace
> the contents of the FlowFile with the results of the command, so the
> FlowFIle will have the encoding value and no longer have the original
> content.
> >
> >
> >This could potentially be solved in the future with the "hold file"
> processor [1] where the original file is held on one path, while the same
> file goes to ExecuteStreamCommand, after getting the encoding it could be
> extracted to an attribute and then trigger the original file for release,
> copying over the encoding attribute.
> >
> >
> >[1] https://issues.apache.org/jira/browse/NIFI-190
> >
> >
> >
> >
> >
> >
> >On Tue, Oct 27, 2015 at 10:24 AM, Joe Percivall <[email protected]>
> wrote:
> >
> >Hey Charlie,
> >>
> >>Sorry no one has followed up with you yet. One way I see around
> ConvertCharacterSet not supporting expression language is to route on
> attribute (assuming the character set is extracted to be an attribute) to
> different ConvertCharacterSet processors depending on the input character
> set.
> >>
> >>That being said, I don't see a reason why the ConvertCharacterSet
> shouldn't support expression language. If anyone doesn't have objections
> I'll put in a ticket later today and knock it out real quick.
> >>
> >>
> >>Joe
> >>- - - - - -
> >>Joseph Percivall
> >>linkedin.com/in/Percivall
> >>e: [email protected]
> >>
> >>
> >>
> >>
> >>
> >>On Sunday, October 25, 2015 7:13 PM, Charlie Frasure <
> [email protected]> wrote:
> >>
> >>
> >>
> >>I'm looking to process many files into common formats.  The source files
> are coming in various character sets, mime types, and new line terminators.
> >>
> >>My thinking for a data flow was along these lines:
> >>
> >>GetFile (from many sub directories) ->
> >>ExecuteStreamCommand (file -i) ->
> >>ConvertCharacterSet (from previous command to utf8) ->
> >>ReplaceText (to change any \r\n into \n) ->
> >>PutFile (into a directory structure based on values found in the
> original file path and filename)
> >>
> >>Additional steps would be added for archiving a copy of the original,
> converting xml files, etc.
> >>
> >>Attempting to process these with Nifi leaves me confused as to how to
> process within the tool.  If I want to ConvertCharacterSet, I have to know
> the input type.  I setup a ExecuteStreamCommand to file -i
> ${absolute.path:append(${filename})} which returned the expected values.  I
> don't see a way to turn these results into input for the processor, which
> doesn't accept expression language for that field.
> >>
> >>I also considered ConvertCSVToAvro as an interim step but notice the
> same issue.  Any suggestions what this dataflow should look like?
> >>
> >>
> >>Charlie
> >>
> >
>

Re: ConvertCharacterSet

Reply via email to