I've always found that IdentifyMimeType returns a wide, wide range of
values for mime.type. There is often ambiguity that mime.type is a reliable
indicator of the nature of the content. To illustrate, I've passed file.txt
into Nifi that contains a string representation of json. I'd expect this to
be handled as textual data, but mime.type gets set to
application/json;charset=UTF-8.

Perhaps I am misusing the attribute mime.type. How have you worked around
this challenge Joe?

On Fri, Nov 3, 2017 at 7:54 AM, Joe Witt <[email protected]> wrote:

> "How can discern binary or character content using conditional checks
> to be sure I handle the file properly?"
>
> Use NiFi and the existing processors where able and extend/script only
> where necessary/critical.  For the case you mention use
> IdentifyMimeType and route appropriate data to the appropriate script
> execution.
>
> Joe
>
> On Fri, Nov 3, 2017 at 7:04 AM, James McMahon <[email protected]>
> wrote:
> > Andy, regarding the the code sample you offered above - doesn't this put
> > into text both the attributes metadata and the payload of the flowfile?
> >
> > If that is the case, how does one modify that to read in from the stream
> > into variable text only the file payload?
> >
> > On Fri, Nov 3, 2017 at 5:48 AM, James McMahon <[email protected]>
> wrote:
> >>
> >> Thank you Andy. I'd like to ask just a few quick follow up questions.
> >>
> >> 1- My flow content may be textual characters, and it can also be binary
> -
> >> jpgs, pngs, and similar. How can discern binary or character content
> using
> >> conditional checks to be sure I handle the file properly? How would I
> alter
> >> this
> >>
> >> text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
> >>
> >> to read in the data from the stream as binary data in that case?
> >>
> >> 2- In the case where my data in the flowfile payload is binary, do I
> have
> >> another version of this....
> >>
> >> outputStream.write(bytearray(reversedText.encode('utf-8')))
> >>
> >> ....that omits the encoding, like so:
> >>
> >> outputStream.write(bytearray(some_binary))  ?
> >>
> >> Thank you very much in advance. -Jim
> >>
> >> On Thu, Nov 2, 2017 at 8:26 PM, Andy LoPresto <[email protected]>
> >> wrote:
> >>>
> >>> James,
> >>>
> >>> The Python API should be the same as the Java FlowFile.java interface
> >>> [1]. Matt Burgess’ blog has a good post about using Jython to do
> flowfile
> >>> content manipulation. Something like:
> >>>
> >>> flowFile = session.get()
> >>> if (flowFile != None):
> >>>   flowFile = session.write(flowFile,PyStreamCallback())
> >>>   session.transfer(flowFile, REL_SUCCESS)
> >>>
> >>> With PyStreamCallback declared as a class above that block in the
> script:
> >>>
> >>> import java.io
> >>> from org.apache.commons.io import IOUtils
> >>> from java.nio.charset import StandardCharsets
> >>> from org.apache.nifi.processor.io import StreamCallback
> >>>
> >>> class PyStreamCallback(StreamCallback):
> >>>   def __init__(self):
> >>>         pass
> >>>   def process(self, inputStream, outputStream):
> >>>     text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
> >>>     reversedText = text[::-1]
> >>>
> >>>     outputStream.write(bytearray(reversedText.encode('utf-8')))
> >>>
> >>> In Groovy, you can declare the StreamCallback as an inline closure to
> >>> make this more compact, but I believe in Jython it needs to be a
> separate
> >>> declaration. Hope this helps.
> >>>
> >>> [1]
> >>> https://github.com/apache/nifi/blob/master/nifi-api/src/
> main/java/org/apache/nifi/flowfile/FlowFile.java
> >>> [2]
> >>> https://funnifi.blogspot.com/2016/03/executescript-json-to-
> json-revisited_14.html
> >>>
> >>>
> >>> Andy LoPresto
> >>> [email protected]
> >>> [email protected]
> >>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> >>>
> >>> On Nov 2, 2017, at 12:53 PM, James McMahon <[email protected]>
> wrote:
> >>>
> >>> In python, I can use the requests library to post content something
> like
> >>> htis:
> >>>
> >>> import requests
> >>> url="https://abc.test.org";
> >>> files={'file':open('/somedir/myfile.txt','rb')}
> >>> r = requests.post(url,files=files)
> >>>
> >>> If I am in a python stream callback, how can I read the flowfile
> payload
> >>> in the same way that the open() reads its file from disk?
> >>>
> >>>
> >>
> >
>

Reply via email to