Hi Chris, Yes you are correct that large payloads can be moved through NiFi.
As data moves through NiFi, a pointer to the data is being passed around, referred to as a FlowFile. The content of the FlowFile is only accessed as needed. The key for large payloads would be to operate on the payload in a streaming fashion so that you don't read too many large payloads in to memory exceed your JVM memory. As an example, a typical pattern for bringing data into HDFS from NiFi, is to use a MergeContent processor right before a PutHDFS processor. MergeContent can take many small/medium size files and merge them together to form an appropriate size file for HDFS. It does this by copying all of the input streams from the original files to a new output stream, and can therefore merge a large amount of files without exceeding the memory of the JVM. Hope that helps. -Bryan On Tue, Sep 1, 2015 at 10:51 AM, Chris Teoh <[email protected]> wrote: > Hi, > > Thanks for the help thus far with NiFi. I'll get to implementing those > suggestions posed to me from my earlier messages. > > I'm also considering other message bus products that may be suitable too. > In doing so, I have read that these message bus systems like RabbitMQ > typically don't have large message payloads sent through them. In one of > the video demos I saw on YouTube, it appeared that NiFi was used to ingest > data from a Twitter feed. Am I correct in assuming that I can transit large > volumes of data through NiFi flows in and out of Hadoop? > > I'm thinking other message bus systems end up just handling the signaling > and the bulk data transfers occur outside of the message bus and never > transit through it. > > Kind regards > Chris >
