This is very helpful Russell, but in my case each file is a mix of data types. So even if i determine that the flowfile is a mix, I'd still have to be poised to tackle it it my ExecuteScript script. Good suggestion, though, and one I can use in other ways in my workflows.
I do hope someone can tell me what I can do in my callback write back to handle all. I'd like to better understand this error I'm getting, too. -Jim On Thu, Feb 2, 2017 at 6:02 PM, Russell Bateman <[email protected]> wrote: > Could you use *RouteOnContent* to determine what sort of content you're > dealing with, then branch to different *ExecuteScript* processors rigged > to different Python scripts? > > Hope this comment is helpful. > > > On 02/02/2017 03:38 PM, James McMahon wrote: > > I have a flowfile that has tagged character information I need to get at > throughout the first few sections of the file. I need to use regex in > python to select some of those values and to transform others. I am using > an ExecuteScript processor to execute my python code. Here is my approach: > > > > = = = = = > > class PyStreamCallback(StreamCallback) : > > def __init__ (self) : > > def process(self, inputSteam, outputStream) : > > stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8) # > what happens to my binary and extreme chars when they get passed through > this step? > > . > > . (transform and pick out select content) > > . > > outputStream.write(bytearray(stuff.encode(‘utf-8’)))) # am I > using the wrong functions to put my text chars and my binary and my extreme > chars back on the stream as a byte stream? What should I be doing to handle > the variety of data? > > > > flowFile = session.get() > > if (flowFile!= None) > > incoming = flowFile.getAttribute(‘filename’) > > logging.info(‘about to process file: %s’, incoming) > > flowFile = session.write(flowFile, PyStreamCallback()) # line 155 in > my code > > session.transfer(flowFile, REL_SUCCESS) > > session.commit() > > > > = = = = = > > > > When my incoming flowfile is all character content - such as tagged xml - > my code works fine. All the flowfiles that also contain some binary data > and/or characters at the extremes such as foreign language characters don’t > work. They error out. I suspect it has to do with the way I am writing back > to the flowfile stream. > > > > Here is the error I am getting: > > Org.apache.nifi.processor.exception.ProcessException: > javax.script.ScriptException: TypeError: write(): 1st arg can’t be > coerced to int, byte[] in <script> at line number 155 > > > > How should I handle the write back to the flowfile in cases where I have a > mix of character and binary? > > > Note: I must do this programmatically. I tried using a combination of > SplitContent and MergeContent, but I have no consistent reliable > demarcation between the regular text characters and the other more > challenging characters that I can split on. > > All the examples I've found handle more pure circumstances than mine seems > to be. For example, all text. Or all JSON. I've not yet been able to find > an example that shows me how to write back to the stream for mixed data > situations. Can you help? > > >
