Re: Writing back through a python stream callback when the flowfile content is a mix of character and binary

Russell Bateman Thu, 02 Feb 2017 15:02:38 -0800

Could you use /RouteOnContent/ to determine what sort of content you'redealing with, then branch to different /ExecuteScript/ processors riggedto different Python scripts?


Hope this comment is helpful.



On 02/02/2017 03:38 PM, James McMahon wrote:

I have a flowfile that has tagged character information I need to getat throughout the first few sections of the file. I need to use regexin python to select some of those values and to transform others. I amusing an ExecuteScript processor to execute my python code. Here is myapproach:
= = = = =

class PyStreamCallback(StreamCallback) :

def __init__ (self) :

def process(self, inputSteam, outputStream) :
stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8) # whathappens to my binary and extreme chars when they get passed throughthis step?
.

. (transform and pick out select content)

.
outputStream.write(bytearray(stuff.encode(‘utf-8’)))) # am I usingthe wrong functions to put my text chars and my binary and my extremechars back on the stream as a byte stream? What should I be doing tohandle the variety of data?
flowFile = session.get()

if (flowFile!= None)

incoming = flowFile.getAttribute(‘filename’)

logging.info <http://logging.info>(‘about to process file: %s’, incoming)
flowFile = session.write(flowFile, PyStreamCallback()) # line 155 inmy code
session.transfer(flowFile, REL_SUCCESS)

session.commit()

= = = = =
When my incoming flowfile is all character content - such as taggedxml - my code works fine. All the flowfiles that also contain somebinary data and/or characters at the extremes such as foreign languagecharacters don’t work. They error out. I suspect it has to do with theway I am writing back to the flowfile stream.
Here is the error I am getting:
Org.apache.nifi.processor.exception.ProcessException:javax.script.ScriptException: TypeError: write(): 1^st arg can’t becoerced to int, byte[] in <script> at line number 155
How should I handle the write back to the flowfile in cases where Ihave a mix of character and binary?
Note: I must do this programmatically. I tried using a combination ofSplitContent and MergeContent, but I have no consistent reliabledemarcation between the regular text characters and the other morechallenging characters that I can split on.
All the examples I've found handle more pure circumstances than mineseems to be. For example, all text. Or all JSON. I've not yet beenable to find an example that shows me how to write back to the streamfor mixed data situations. Can you help?

Re: Writing back through a python stream callback when the flowfile content is a mix of character and binary

Reply via email to