Mime type detection can be difficult business but I trust Apache Tika to do a far better job than I ever could. The result you show for JSON appears correct and I'd simply add that string to the list of routing attributes that i treat as text. Or I'd key off the charset being being provided as that would tell me enough to know it is text or however I wanted to treat it.
Thanks On Fri, Nov 3, 2017 at 8:24 AM, James McMahon <[email protected]> wrote: > I've always found that IdentifyMimeType returns a wide, wide range of values > for mime.type. There is often ambiguity that mime.type is a reliable > indicator of the nature of the content. To illustrate, I've passed file.txt > into Nifi that contains a string representation of json. I'd expect this to > be handled as textual data, but mime.type gets set to > application/json;charset=UTF-8. > > Perhaps I am misusing the attribute mime.type. How have you worked around > this challenge Joe? > > On Fri, Nov 3, 2017 at 7:54 AM, Joe Witt <[email protected]> wrote: >> >> "How can discern binary or character content using conditional checks >> to be sure I handle the file properly?" >> >> Use NiFi and the existing processors where able and extend/script only >> where necessary/critical. For the case you mention use >> IdentifyMimeType and route appropriate data to the appropriate script >> execution. >> >> Joe >> >> On Fri, Nov 3, 2017 at 7:04 AM, James McMahon <[email protected]> >> wrote: >> > Andy, regarding the the code sample you offered above - doesn't this put >> > into text both the attributes metadata and the payload of the flowfile? >> > >> > If that is the case, how does one modify that to read in from the stream >> > into variable text only the file payload? >> > >> > On Fri, Nov 3, 2017 at 5:48 AM, James McMahon <[email protected]> >> > wrote: >> >> >> >> Thank you Andy. I'd like to ask just a few quick follow up questions. >> >> >> >> 1- My flow content may be textual characters, and it can also be binary >> >> - >> >> jpgs, pngs, and similar. How can discern binary or character content >> >> using >> >> conditional checks to be sure I handle the file properly? How would I >> >> alter >> >> this >> >> >> >> text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) >> >> >> >> to read in the data from the stream as binary data in that case? >> >> >> >> 2- In the case where my data in the flowfile payload is binary, do I >> >> have >> >> another version of this.... >> >> >> >> outputStream.write(bytearray(reversedText.encode('utf-8'))) >> >> >> >> ....that omits the encoding, like so: >> >> >> >> outputStream.write(bytearray(some_binary)) ? >> >> >> >> Thank you very much in advance. -Jim >> >> >> >> On Thu, Nov 2, 2017 at 8:26 PM, Andy LoPresto <[email protected]> >> >> wrote: >> >>> >> >>> James, >> >>> >> >>> The Python API should be the same as the Java FlowFile.java interface >> >>> [1]. Matt Burgess’ blog has a good post about using Jython to do >> >>> flowfile >> >>> content manipulation. Something like: >> >>> >> >>> flowFile = session.get() >> >>> if (flowFile != None): >> >>> flowFile = session.write(flowFile,PyStreamCallback()) >> >>> session.transfer(flowFile, REL_SUCCESS) >> >>> >> >>> With PyStreamCallback declared as a class above that block in the >> >>> script: >> >>> >> >>> import java.io >> >>> from org.apache.commons.io import IOUtils >> >>> from java.nio.charset import StandardCharsets >> >>> from org.apache.nifi.processor.io import StreamCallback >> >>> >> >>> class PyStreamCallback(StreamCallback): >> >>> def __init__(self): >> >>> pass >> >>> def process(self, inputStream, outputStream): >> >>> text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) >> >>> reversedText = text[::-1] >> >>> >> >>> outputStream.write(bytearray(reversedText.encode('utf-8'))) >> >>> >> >>> In Groovy, you can declare the StreamCallback as an inline closure to >> >>> make this more compact, but I believe in Jython it needs to be a >> >>> separate >> >>> declaration. Hope this helps. >> >>> >> >>> [1] >> >>> >> >>> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/flowfile/FlowFile.java >> >>> [2] >> >>> >> >>> https://funnifi.blogspot.com/2016/03/executescript-json-to-json-revisited_14.html >> >>> >> >>> >> >>> Andy LoPresto >> >>> [email protected] >> >>> [email protected] >> >>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >> >>> >> >>> On Nov 2, 2017, at 12:53 PM, James McMahon <[email protected]> >> >>> wrote: >> >>> >> >>> In python, I can use the requests library to post content something >> >>> like >> >>> htis: >> >>> >> >>> import requests >> >>> url="https://abc.test.org" >> >>> files={'file':open('/somedir/myfile.txt','rb')} >> >>> r = requests.post(url,files=files) >> >>> >> >>> If I am in a python stream callback, how can I read the flowfile >> >>> payload >> >>> in the same way that the open() reads its file from disk? >> >>> >> >>> >> >> >> > > >
