Could you use /RouteOnContent/ to determine what sort of content you're
dealing with, then branch to different /ExecuteScript/ processors rigged
to different Python scripts?
Hope this comment is helpful.
On 02/02/2017 03:38 PM, James McMahon wrote:
I have a flowfile that has tagged character information I need to get
at throughout the first few sections of the file. I need to use regex
in python to select some of those values and to transform others. I am
using an ExecuteScript processor to execute my python code. Here is my
approach:
= = = = =
class PyStreamCallback(StreamCallback) :
def __init__ (self) :
def process(self, inputSteam, outputStream) :
stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8) # what
happens to my binary and extreme chars when they get passed through
this step?
.
. (transform and pick out select content)
.
outputStream.write(bytearray(stuff.encode(‘utf-8’)))) # am I using
the wrong functions to put my text chars and my binary and my extreme
chars back on the stream as a byte stream? What should I be doing to
handle the variety of data?
flowFile = session.get()
if (flowFile!= None)
incoming = flowFile.getAttribute(‘filename’)
logging.info <http://logging.info>(‘about to process file: %s’, incoming)
flowFile = session.write(flowFile, PyStreamCallback()) # line 155 in
my code
session.transfer(flowFile, REL_SUCCESS)
session.commit()
= = = = =
When my incoming flowfile is all character content - such as tagged
xml - my code works fine. All the flowfiles that also contain some
binary data and/or characters at the extremes such as foreign language
characters don’t work. They error out. I suspect it has to do with the
way I am writing back to the flowfile stream.
Here is the error I am getting:
Org.apache.nifi.processor.exception.ProcessException:
javax.script.ScriptException: TypeError: write(): 1^st arg can’t be
coerced to int, byte[] in <script> at line number 155
How should I handle the write back to the flowfile in cases where I
have a mix of character and binary?
Note: I must do this programmatically. I tried using a combination of
SplitContent and MergeContent, but I have no consistent reliable
demarcation between the regular text characters and the other more
challenging characters that I can split on.
All the examples I've found handle more pure circumstances than mine
seems to be. For example, all text. Or all JSON. I've not yet been
able to find an example that shows me how to write back to the stream
for mixed data situations. Can you help?