Can you share with us a little more information about the schema/format of your incoming data? Is there always a tag before a data item, for example?
Thanks, Matt On Thu, Feb 2, 2017 at 8:26 PM, James McMahon <[email protected]> wrote: > Thank you very much Matt. I would be most interested in any insights you > gain if you are able to recreate the problem. > > If you have a moment, can you offer up a line of code showing how one might > wrap a call around the byte stream to treat the bytes as a string that can > be matched against using, for instance, a compiled re pattern? I will > definitely look more closely at the Oracle docs link you provided. An > example would help me when I tackle this. -Jim > > On Thu, Feb 2, 2017 at 6:56 PM, Matt Burgess <[email protected]> wrote: >> >> James, >> >> If you'd rather work with the inputStream as bytes, you don't need the >> IOUtils.toString() call, and I'm not sure what a UTF-8 charset would >> do to your mixed data. You can wrap any of the *InputStream >> decorators around the inputStream object, such as DataInputStream [1] >> to read various data types from the underlying bytes in the stream. >> Alternatively you may want to read all the bytes into an array you can >> work with directly via Jython methods instead of using Java I/O. >> >> What's weird about the TypeError is that it looks like it is calling a >> different write() method than I would've expected, I wonder if the >> translation of Jython to Java objects is somehow making the processor >> not be able to match up a method signature. If the error is not >> occurring in the redacted code block above, I will give this script a >> try, to see if I can reproduce and/or fix the error. >> >> Regards, >> Matt >> >> [1] https://docs.oracle.com/javase/8/docs/api/java/io/DataInputStream.html >> >> >> On Thu, Feb 2, 2017 at 6:19 PM, James McMahon <[email protected]> >> wrote: >> > This is very helpful Russell, but in my case each file is a mix of data >> > types. So even if i determine that the flowfile is a mix, I'd still have >> > to >> > be poised to tackle it it my ExecuteScript script. Good suggestion, >> > though, >> > and one I can use in other ways in my workflows. >> > >> > I do hope someone can tell me what I can do in my callback write back to >> > handle all. I'd like to better understand this error I'm getting, too. >> > -Jim >> > >> > On Thu, Feb 2, 2017 at 6:02 PM, Russell Bateman <[email protected]> >> > wrote: >> >> >> >> Could you use RouteOnContent to determine what sort of content you're >> >> dealing with, then branch to different ExecuteScript processors rigged >> >> to >> >> different Python scripts? >> >> >> >> Hope this comment is helpful. >> >> >> >> >> >> On 02/02/2017 03:38 PM, James McMahon wrote: >> >> >> >> I have a flowfile that has tagged character information I need to get >> >> at >> >> throughout the first few sections of the file. I need to use regex in >> >> python >> >> to select some of those values and to transform others. I am using an >> >> ExecuteScript processor to execute my python code. Here is my approach: >> >> >> >> >> >> >> >> = = = = = >> >> >> >> class PyStreamCallback(StreamCallback) : >> >> >> >> def __init__ (self) : >> >> >> >> def process(self, inputSteam, outputStream) : >> >> >> >> stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8) # >> >> what happens to my binary and extreme chars when they get passed >> >> through >> >> this step? >> >> >> >> . >> >> >> >> . (transform and pick out select content) >> >> >> >> . >> >> >> >> outputStream.write(bytearray(stuff.encode(‘utf-8’)))) # am I >> >> using the wrong functions to put my text chars and my binary and my >> >> extreme >> >> chars back on the stream as a byte stream? What should I be doing to >> >> handle >> >> the variety of data? >> >> >> >> >> >> >> >> flowFile = session.get() >> >> >> >> if (flowFile!= None) >> >> >> >> incoming = flowFile.getAttribute(‘filename’) >> >> >> >> logging.info(‘about to process file: %s’, incoming) >> >> >> >> flowFile = session.write(flowFile, PyStreamCallback()) # line 155 >> >> in >> >> my code >> >> >> >> session.transfer(flowFile, REL_SUCCESS) >> >> >> >> session.commit() >> >> >> >> >> >> >> >> = = = = = >> >> >> >> >> >> >> >> When my incoming flowfile is all character content - such as tagged xml >> >> - >> >> my code works fine. All the flowfiles that also contain some binary >> >> data >> >> and/or characters at the extremes such as foreign language characters >> >> don’t >> >> work. They error out. I suspect it has to do with the way I am writing >> >> back >> >> to the flowfile stream. >> >> >> >> >> >> >> >> Here is the error I am getting: >> >> >> >> Org.apache.nifi.processor.exception.ProcessException: >> >> javax.script.ScriptException: TypeError: write(): 1st arg can’t be >> >> coerced >> >> to int, byte[] in <script> at line number 155 >> >> >> >> >> >> >> >> How should I handle the write back to the flowfile in cases where I >> >> have a >> >> mix of character and binary? >> >> >> >> >> >> >> >> Note: I must do this programmatically. I tried using a combination of >> >> SplitContent and MergeContent, but I have no consistent reliable >> >> demarcation >> >> between the regular text characters and the other more challenging >> >> characters that I can split on. >> >> >> >> All the examples I've found handle more pure circumstances than mine >> >> seems >> >> to be. For example, all text. Or all JSON. I've not yet been able to >> >> find an >> >> example that shows me how to write back to the stream for mixed data >> >> situations. Can you help? >> >> >> >> >> > > >
