James, I haven't had a chance to dig into this yet, but one thing I noticed about your script was an issue identified by Bryan Rosander (NiFi committer and all-around good guy :) as the probable cause of the TypeError, namely the calling of bytearray() after encode() (the latter of which already returns a byte array) [1]. Does removing the call to bytearray() fix your script, or are there still issues with decoding the input stream?
Regards, Matt [1] https://community.hortonworks.com/questions/81291/nifi-executescript-processor-error-using-string-in.html On Thu, Feb 2, 2017 at 5:38 PM, James McMahon <[email protected]> wrote: > I have a flowfile that has tagged character information I need to get at > throughout the first few sections of the file. I need to use regex in python > to select some of those values and to transform others. I am using an > ExecuteScript processor to execute my python code. Here is my approach: > > > > = = = = = > > class PyStreamCallback(StreamCallback) : > > def __init__ (self) : > > def process(self, inputSteam, outputStream) : > > stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8) # what > happens to my binary and extreme chars when they get passed through this > step? > > . > > . (transform and pick out select content) > > . > > outputStream.write(bytearray(stuff.encode(‘utf-8’)))) # am I using > the wrong functions to put my text chars and my binary and my extreme chars > back on the stream as a byte stream? What should I be doing to handle the > variety of data? > > > > flowFile = session.get() > > if (flowFile!= None) > > incoming = flowFile.getAttribute(‘filename’) > > logging.info(‘about to process file: %s’, incoming) > > flowFile = session.write(flowFile, PyStreamCallback()) # line 155 in my > code > > session.transfer(flowFile, REL_SUCCESS) > > session.commit() > > > > = = = = = > > > > When my incoming flowfile is all character content - such as tagged xml - my > code works fine. All the flowfiles that also contain some binary data and/or > characters at the extremes such as foreign language characters don’t work. > They error out. I suspect it has to do with the way I am writing back to the > flowfile stream. > > > > Here is the error I am getting: > > Org.apache.nifi.processor.exception.ProcessException: > javax.script.ScriptException: TypeError: write(): 1st arg can’t be coerced > to int, byte[] in <script> at line number 155 > > > > How should I handle the write back to the flowfile in cases where I have a > mix of character and binary? > > > > Note: I must do this programmatically. I tried using a combination of > SplitContent and MergeContent, but I have no consistent reliable demarcation > between the regular text characters and the other more challenging > characters that I can split on. > > All the examples I've found handle more pure circumstances than mine seems > to be. For example, all text. Or all JSON. I've not yet been able to find an > example that shows me how to write back to the stream for mixed data > situations. Can you help?
