Charlie,

"1. Would it be more efficient to let the files queue up, or to try to
match the process rate with timing or back pressure?"

  From an efficiency perspective each transaction has a cost.  What you
have to consider is whether you seek lower latency or higher throughput.
The default settings of NiFi generally favor lower latency.  You can on
many processors slide the 'run duration' slider on the scheduling tab to
the right a bit and the framework can then batch multiple transactions into
one over say a 25 millisecond period.

"2. I've made the suggested system edits in the administrator's guide, as
well as increasing xms and xmx somewhat.  Any additional suggestions?"

  If the flowfile, content, provenance, logs, OS are all on the same
physical partition performance will be impacted.  If you're trying achieve
higher events/second or higher MB/s of data into and out of the system as a
whole this is a great place to start.  On rather modest enterprise servers
you should expect tens if not hundreds of thousands of events per second
and more than 100 MB/s of throughput.

"3. Somewhat related, I think that logging / provenance is eating up disk
space.  My install directory stays large, even after processing has
finished, until I stop and start NiFi which significantly reduces the disk
in use."

  The default provenance and logging settings will eat up disk space and
disk utilization.  You can turn logging down to  lower threshold to get
less logged and you can flip provenance into an in-memory mode if you need
or want to.

"4. Maybe unrelated, what does the number that appears in a little white
box at the top right of a processor indicate?  It seems to show up on
processors that have a large queue in front of them."

That tells you how many threads the framework has outstanding for that
process at that time.  It makes sense that number is often associated with
processors that have a queue.

Key things to understand is what aspects of resource utilization are high:
- CPU?  If so how are you measuring that?
- Network?  Do you have a 1GB NIC and are seeing less than 120 MB/s
throughput?
- Memory?  How large is the system memory and how large of a heap have you
allocated to Java?
- Disk?  How many physical disks do you have and what is the utilization
while running?  How are you measuring this?

Question
- what is the performance you are wanting and how far off is that from what
you're seeing?
- do you have a flow template you can share so we could help identify any
potential problems?

Thanks
Joe

On Sun, Dec 6, 2015 at 1:23 AM, Charlie Frasure <[email protected]>
wrote:

> I sent this from the wrong email account a few days ago, but am still
> interested in any thoughts.
>
> Got an interesting message that prompted me to follow up on a few
> questions I have.  The message is the screenshot below (if it works).  It
> says: "WARNING The rate of the dataflow is exceeding the provenance
> recording rate.  Slowing down flow to accommodate."
>
> [image: Inline image 1]
>
> This flow has queued up a few hundred thousand files (mostly very small)
> and I'm not sure that's ideal.  I read that there is some automatic
> swapping that takes place at 20k file queues.  It does eventually process
> the files, but I would like to make sure we're taking advantage of any
> performance options.
>
> 1. Would it be more efficient to let the files queue up, or to try to
> match the process rate with timing or back pressure?
>
> 2. I've made the suggested system edits in the administrator's guide, as
> well as increasing xms and xmx somewhat.  Any additional suggestions?
>
> 3. Somewhat related, I think that logging / provenance is eating up disk
> space.  My install directory stays large, even after processing has
> finished, until I stop and start NiFi which significantly reduces the disk
> in use.
>
> 4. Maybe unrelated, what does the number that appears in a little white
> box at the top right of a processor indicate?  It seems to show up on
> processors that have a large queue in front of them.
>
> [image: Inline image 2]
>
> Thanks,
> Charlie
>
>
>

Reply via email to