Thanks Joe.

Following through the inquiries on multiple content repositories. I still
have a few more questions. :)

1. Is it correct to say that the use case for having multiple content
repositories is to take advantage of parallel disk writes assuming that the
system have multiple bare metal disk drives mounted? Are there any other
use cases for doing multiple content repositories?

2. On an enterprise environment wherein NiFi writes to a SAN (Storage Area
Network) does it make sense to have logical mounted volumes for the
multiple content repositories? Or are we better off having just one content
repository. Of course the assumption here is that we are dealing with
multiple files with 10 to 50 gigabytes in sizes.

3. Will NiFi have disk contention issues in a scenario wherein we have 5 or
more independent flows on a single NiFI instance and all the flows are
involved in ETL?

Regards,
Chris



On Fri, Nov 27, 2015 at 3:56 AM, Joe Witt <[email protected]> wrote:

> Chris,
>
> It is something which occurs automatically and behind the scenes.
> Under normal circumstances there will be many FlowFiles written to the
> same content claim they'll just each have different offsets.  It is
> more aligned with how disks work in terms of efficiently writing data,
> efficiently reading data, and efficiently deleting the entire claim
> (which is a file on disk).  Rather than a delete per flowfile we
> delete once there are no more references to the entire claim.  Much
> faster.  And all of that is totally abstracted away from the
> perspective of someone writing extensions.  This bit, combined with
> the copy on write and pass by reference logic the content repository
> provides is a key part of what makes nifi efficient.
>
> Thanks
> Joe
>
> On Thu, Nov 26, 2015 at 1:40 AM, Chris Lim <[email protected]>
> wrote:
> > Thanks Mark.
> >
> > The answer on the content repository round-robin is perfect. :)
> >
> > It got me curious when you mentioned that one or more FlowFiles can be
> > written to the same Resource Claim. Is there a specific scenario wherein
> > this can occur? Under normal circumstances there is only one FlowFile
> > written to a Resource Claim?
> >
> > --
> > Chris
> >
> >
> > On Wed, Nov 25, 2015 at 9:39 PM, Mark Payne <[email protected]>
> wrote:
> >>
> >> Chris,
> >>
> >> In terms of round robin-ing between the repositories, yes, it follows a
> >> simple round-robin approach.
> >> In terms of sections within those containers, the answer is more of a
> >> "sort-of." Each FlowFile has what
> >> we refer to as a Resource Claim, which points to a location in the
> content
> >> repository. In the case of the
> >> FileSystemRepository (which is the default and almost all that's ever
> used
> >> right now), the Resource Claim
> >> maps to a file on disk. In order to be very efficient, we may write many
> >> FlowFiles to the same Resource Claim.
> >>
> >> Once we finish writing to a particular Resource Claim, we close the
> >> resources and create a new one for the next
> >> FlowFile. When we create these Resource Claims, we do so in a
> round-robin
> >> fashion across the different Sections
> >> of the content repository.
> >>
> >> Sorry, this is a fairly long-winded answer to such a seemingly simple
> >> question :) but I wasn't sure how much detail you were
> >> looking for. If anything is not clear, let us know.
> >>
> >> Thanks
> >> -Mark
> >>
> >>
> >> On Nov 25, 2015, at 5:12 AM, Chris Lim <[email protected]>
> >> wrote:
> >>
> >> Hi Guys,
> >>
> >> I am configuring our NiFi instance to have multiple content repositories
> >> specifically with the "nifi.content.repository.directory." property
> setting
> >> as mentioned in the Administrator's guide. Am I correct that flow file
> >> contents are written to the repository using a round-robin algorithm?
> Also,
> >> does the sections within a specific content repository follow the same
> >> round-robin algorithm?
> >>
> >> Thanks,
> >> Chris
> >>
> >>
> >
>

Reply via email to