There is also a /SplitContent/ processor. Assuming you can recognize the boundaries of the different data types, you can split them up into separate flowfiles. Then you /MergeContent/ them back together later.

On 02/02/2017 04:19 PM, James McMahon wrote:
This is very helpful Russell, but in my case each file is a mix of data types. So even if i determine that the flowfile is a mix, I'd still have to be poised to tackle it it my ExecuteScript script. Good suggestion, though, and one I can use in other ways in my workflows.

I do hope someone can tell me what I can do in my callback write back to handle all. I'd like to better understand this error I'm getting, too. -Jim

On Thu, Feb 2, 2017 at 6:02 PM, Russell Bateman <[email protected] <mailto:[email protected]>> wrote:

    Could you use /RouteOnContent/ to determine what sort of content
    you're dealing with, then branch to different /ExecuteScript/
    processors rigged to different Python scripts?

    Hope this comment is helpful.


    On 02/02/2017 03:38 PM, James McMahon wrote:

    I have a flowfile that has tagged character information I need to
    get at throughout the first few sections of the file. I need to
    use regex in python to select some of those values and to
    transform others. I am using an ExecuteScript processor to
    execute my python code. Here is my approach:

    = = = = =

    class PyStreamCallback(StreamCallback) :

       def __init__ (self) :

       def process(self, inputSteam, outputStream) :

          stuff = IOUtils.toString(inputStream,
    StandardCharsets.UTF_8)  # what happens to my binary and extreme
    chars when they get passed through this step?

         .

         . (transform and pick out select content)

         .

    outputStream.write(bytearray(stuff.encode(‘utf-8’)))) # am I
    using the wrong functions to put my text chars and my binary and
    my extreme chars back on the stream as a byte stream? What should
    I be doing to handle the variety of data?

    flowFile = session.get()

    if (flowFile!= None)

       incoming = flowFile.getAttribute(‘filename’)

    logging.info <http://logging.info>(‘about to process file: %s’,
    incoming)

       flowFile = session.write(flowFile, PyStreamCallback())   #
    line 155 in my code

    session.transfer(flowFile, REL_SUCCESS)

       session.commit()

    = = = = =

    When my incoming flowfile is all character content - such as
    tagged xml - my code works fine. All the flowfiles that also
    contain some binary data and/or characters at the extremes such
    as foreign language characters don’t work. They error out. I
    suspect it has to do with the way I am writing back to the
    flowfile stream.

    Here is the error I am getting:

    Org.apache.nifi.processor.exception.ProcessException:
    javax.script.ScriptException: TypeError: write(): 1^st arg can’t
    be coerced to int, byte[] in <script> at line number 155

    How should I handle the write back to the flowfile in cases where
    I have a mix of character and binary?

    Note: I must do this programmatically. I tried using a
    combination of SplitContent and MergeContent, but I have no
    consistent reliable demarcation between the regular text
    characters and the other more challenging characters that I can
    split on.

    All the examples I've found handle more pure circumstances than
    mine seems to be. For example, all text. Or all JSON. I've not
    yet been able to find an example that shows me how to write back
    to the stream for mixed data situations. Can you help?



Reply via email to