Thank you very much Matt. I would be most interested in any insights you gain if you are able to recreate the problem.
If you have a moment, can you offer up a line of code showing how one might wrap a call around the byte stream to treat the bytes as a string that can be matched against using, for instance, a compiled re pattern? I will definitely look more closely at the Oracle docs link you provided. An example would help me when I tackle this. -Jim On Thu, Feb 2, 2017 at 6:56 PM, Matt Burgess <[email protected]> wrote: > James, > > If you'd rather work with the inputStream as bytes, you don't need the > IOUtils.toString() call, and I'm not sure what a UTF-8 charset would > do to your mixed data. You can wrap any of the *InputStream > decorators around the inputStream object, such as DataInputStream [1] > to read various data types from the underlying bytes in the stream. > Alternatively you may want to read all the bytes into an array you can > work with directly via Jython methods instead of using Java I/O. > > What's weird about the TypeError is that it looks like it is calling a > different write() method than I would've expected, I wonder if the > translation of Jython to Java objects is somehow making the processor > not be able to match up a method signature. If the error is not > occurring in the redacted code block above, I will give this script a > try, to see if I can reproduce and/or fix the error. > > Regards, > Matt > > [1] https://docs.oracle.com/javase/8/docs/api/java/io/DataInputStream.html > > > On Thu, Feb 2, 2017 at 6:19 PM, James McMahon <[email protected]> > wrote: > > This is very helpful Russell, but in my case each file is a mix of data > > types. So even if i determine that the flowfile is a mix, I'd still have > to > > be poised to tackle it it my ExecuteScript script. Good suggestion, > though, > > and one I can use in other ways in my workflows. > > > > I do hope someone can tell me what I can do in my callback write back to > > handle all. I'd like to better understand this error I'm getting, too. > -Jim > > > > On Thu, Feb 2, 2017 at 6:02 PM, Russell Bateman <[email protected]> > > wrote: > >> > >> Could you use RouteOnContent to determine what sort of content you're > >> dealing with, then branch to different ExecuteScript processors rigged > to > >> different Python scripts? > >> > >> Hope this comment is helpful. > >> > >> > >> On 02/02/2017 03:38 PM, James McMahon wrote: > >> > >> I have a flowfile that has tagged character information I need to get at > >> throughout the first few sections of the file. I need to use regex in > python > >> to select some of those values and to transform others. I am using an > >> ExecuteScript processor to execute my python code. Here is my approach: > >> > >> > >> > >> = = = = = > >> > >> class PyStreamCallback(StreamCallback) : > >> > >> def __init__ (self) : > >> > >> def process(self, inputSteam, outputStream) : > >> > >> stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8) # > >> what happens to my binary and extreme chars when they get passed through > >> this step? > >> > >> . > >> > >> . (transform and pick out select content) > >> > >> . > >> > >> outputStream.write(bytearray(stuff.encode(‘utf-8’)))) # am I > >> using the wrong functions to put my text chars and my binary and my > extreme > >> chars back on the stream as a byte stream? What should I be doing to > handle > >> the variety of data? > >> > >> > >> > >> flowFile = session.get() > >> > >> if (flowFile!= None) > >> > >> incoming = flowFile.getAttribute(‘filename’) > >> > >> logging.info(‘about to process file: %s’, incoming) > >> > >> flowFile = session.write(flowFile, PyStreamCallback()) # line 155 > in > >> my code > >> > >> session.transfer(flowFile, REL_SUCCESS) > >> > >> session.commit() > >> > >> > >> > >> = = = = = > >> > >> > >> > >> When my incoming flowfile is all character content - such as tagged xml > - > >> my code works fine. All the flowfiles that also contain some binary data > >> and/or characters at the extremes such as foreign language characters > don’t > >> work. They error out. I suspect it has to do with the way I am writing > back > >> to the flowfile stream. > >> > >> > >> > >> Here is the error I am getting: > >> > >> Org.apache.nifi.processor.exception.ProcessException: > >> javax.script.ScriptException: TypeError: write(): 1st arg can’t be > coerced > >> to int, byte[] in <script> at line number 155 > >> > >> > >> > >> How should I handle the write back to the flowfile in cases where I > have a > >> mix of character and binary? > >> > >> > >> > >> Note: I must do this programmatically. I tried using a combination of > >> SplitContent and MergeContent, but I have no consistent reliable > demarcation > >> between the regular text characters and the other more challenging > >> characters that I can split on. > >> > >> All the examples I've found handle more pure circumstances than mine > seems > >> to be. For example, all text. Or all JSON. I've not yet been able to > find an > >> example that shows me how to write back to the stream for mixed data > >> situations. Can you help? > >> > >> > > >
