I've always found that IdentifyMimeType returns a wide, wide range of values for mime.type. There is often ambiguity that mime.type is a reliable indicator of the nature of the content. To illustrate, I've passed file.txt into Nifi that contains a string representation of json. I'd expect this to be handled as textual data, but mime.type gets set to application/json;charset=UTF-8.
Perhaps I am misusing the attribute mime.type. How have you worked around this challenge Joe? On Fri, Nov 3, 2017 at 7:54 AM, Joe Witt <[email protected]> wrote: > "How can discern binary or character content using conditional checks > to be sure I handle the file properly?" > > Use NiFi and the existing processors where able and extend/script only > where necessary/critical. For the case you mention use > IdentifyMimeType and route appropriate data to the appropriate script > execution. > > Joe > > On Fri, Nov 3, 2017 at 7:04 AM, James McMahon <[email protected]> > wrote: > > Andy, regarding the the code sample you offered above - doesn't this put > > into text both the attributes metadata and the payload of the flowfile? > > > > If that is the case, how does one modify that to read in from the stream > > into variable text only the file payload? > > > > On Fri, Nov 3, 2017 at 5:48 AM, James McMahon <[email protected]> > wrote: > >> > >> Thank you Andy. I'd like to ask just a few quick follow up questions. > >> > >> 1- My flow content may be textual characters, and it can also be binary > - > >> jpgs, pngs, and similar. How can discern binary or character content > using > >> conditional checks to be sure I handle the file properly? How would I > alter > >> this > >> > >> text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) > >> > >> to read in the data from the stream as binary data in that case? > >> > >> 2- In the case where my data in the flowfile payload is binary, do I > have > >> another version of this.... > >> > >> outputStream.write(bytearray(reversedText.encode('utf-8'))) > >> > >> ....that omits the encoding, like so: > >> > >> outputStream.write(bytearray(some_binary)) ? > >> > >> Thank you very much in advance. -Jim > >> > >> On Thu, Nov 2, 2017 at 8:26 PM, Andy LoPresto <[email protected]> > >> wrote: > >>> > >>> James, > >>> > >>> The Python API should be the same as the Java FlowFile.java interface > >>> [1]. Matt Burgess’ blog has a good post about using Jython to do > flowfile > >>> content manipulation. Something like: > >>> > >>> flowFile = session.get() > >>> if (flowFile != None): > >>> flowFile = session.write(flowFile,PyStreamCallback()) > >>> session.transfer(flowFile, REL_SUCCESS) > >>> > >>> With PyStreamCallback declared as a class above that block in the > script: > >>> > >>> import java.io > >>> from org.apache.commons.io import IOUtils > >>> from java.nio.charset import StandardCharsets > >>> from org.apache.nifi.processor.io import StreamCallback > >>> > >>> class PyStreamCallback(StreamCallback): > >>> def __init__(self): > >>> pass > >>> def process(self, inputStream, outputStream): > >>> text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) > >>> reversedText = text[::-1] > >>> > >>> outputStream.write(bytearray(reversedText.encode('utf-8'))) > >>> > >>> In Groovy, you can declare the StreamCallback as an inline closure to > >>> make this more compact, but I believe in Jython it needs to be a > separate > >>> declaration. Hope this helps. > >>> > >>> [1] > >>> https://github.com/apache/nifi/blob/master/nifi-api/src/ > main/java/org/apache/nifi/flowfile/FlowFile.java > >>> [2] > >>> https://funnifi.blogspot.com/2016/03/executescript-json-to- > json-revisited_14.html > >>> > >>> > >>> Andy LoPresto > >>> [email protected] > >>> [email protected] > >>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > >>> > >>> On Nov 2, 2017, at 12:53 PM, James McMahon <[email protected]> > wrote: > >>> > >>> In python, I can use the requests library to post content something > like > >>> htis: > >>> > >>> import requests > >>> url="https://abc.test.org" > >>> files={'file':open('/somedir/myfile.txt','rb')} > >>> r = requests.post(url,files=files) > >>> > >>> If I am in a python stream callback, how can I read the flowfile > payload > >>> in the same way that the open() reads its file from disk? > >>> > >>> > >> > > >
