Hi Chris,

Yes you are correct that large payloads can be moved through NiFi.

As data moves through NiFi, a pointer to the data is being passed around,
referred to as a FlowFile. The content of the FlowFile is only accessed as
needed.
The key for large payloads would be to operate on the payload in a
streaming fashion so that you don't read too many large payloads in to
memory exceed your JVM memory.

As an example, a typical pattern for bringing data into HDFS from NiFi, is
to use a MergeContent processor right before a PutHDFS processor.
MergeContent can take many small/medium size files
and merge them together to form an appropriate size file for HDFS. It does
this by copying all of the input streams from the original files to a new
output stream, and can therefore merge a large amount
of files without exceeding the memory of the JVM.

Hope that helps.

-Bryan


On Tue, Sep 1, 2015 at 10:51 AM, Chris Teoh <[email protected]> wrote:

> Hi,
>
> Thanks for the help thus far with NiFi. I'll get to implementing those
> suggestions posed to me from my earlier messages.
>
> I'm also considering other message bus products that may be suitable too.
> In doing so, I have read that these message bus systems like RabbitMQ
> typically don't have large message payloads sent through them. In one of
> the video demos I saw on YouTube, it appeared that NiFi was used to ingest
> data from a Twitter feed. Am I correct in assuming that I can transit large
> volumes of data through NiFi flows in and out of Hadoop?
>
> I'm thinking other message bus systems end up just handling the signaling
> and the bulk data transfers occur outside of the message bus and never
> transit through it.
>
> Kind regards
> Chris
>

Reply via email to